Intercoder Reliability

Calculating your intercoder reliability in Delve.

Alex Limpaecher avatar
Written by Alex Limpaecher
Updated over a week ago

What is Intercoder Reliability?

Intercoder reliability is a measure of agreement or consistency among two or more coders who independently code a transcript. It measures if two or more coders apply the same codes to the same text in the same way.

What are the steps to get intercoder reliability in Delve?

To calculate an intercoder reliability score in Delve you will need two or more researchers to code the same transcript. Follow the steps below to calculate intercoder reliability.

Step 1: Create a project with a codebook

Intercoder reliability measures how your team will apply the same codebook. So before you begin coding, you will need a codebook with well-defined code descriptions.

Step 2. Invite Your Team to the Same Project

After creating the project with a codebook, invite your team to the project. You can invite them using the share button in the upper right-hand corner.

Step 3: Your team should code the transcript using the Code By Me Feature.

Your research team should code the same transcript without looking at each other's work. They can do that using the "Coded By Me" feature, which hides the work of all other team members.

Note: As a team, you should agree if the coders are allowed to add more codes to the codebook. Intercoder reliability is most appropriate for deductive coding exercises, where the team does not add new codes to the codebook.

Step 4: Calculate your intercoder reliability score

To then calculate intercoder reliability use the Transcript Navigation Dropdown and click on Coding Comparison.

There you will see a button for Inter Coder Reliability.

Clicking it will provide you with your intercoder reliability score for that transcript.

How is the Intercoder Reliability Score Calculated?

The intercoder reliability score is calculated using Krippendorff's Alpha, a standard statistical measure for calculating intercoder reliability. Krippendorff's Alpha will range between -1 and 1. With 1 being perfect agreement, 0 is considered no better than chance, and below 0 is considered worse than chance.

What are the benefits of Krippendorff's Alpha?

Krippendorff's Alpha has a number of benefits over other intercoder reliability scores such as percent agreement. They include:

  1. Any number of coders: You can have any number of coders code a transcript, and Delve's Krippendorff Alpha implementation will incorporate all their coding in the score.

  2. Handling of Missing Data: Krippendorff's alpha can handle missing data. That means that your team will not need to code exactly the same snippets to get a valid score.

  3. Adjustment for Chance: Krippendorff's alpha adjusts for the level of agreement that could be expected by chance.

What is a good intercoder reliability score?

A good intercoder reliability score depends on what type of research you are conducting and how it is being used. That being said a score of 0.8 is commonly said to demonstrate consistency.

Do my team's snippets need to perfectly overlap to be included in the score?

No your team's snippets do not need to perfectly overlap to be included in the intercoder reliability calculation. Delve takes into account overlapping snippets when calculating the score.

What happens if only one person codes a piece of text?

If only one person codes a piece of text it is excluded from Krippendorff's Alpha score. Only text that is coded by at least two coders is included in the score. Any text with one or zero coders is essentially ignored by the calculation.

In what scenarios can the intercoder reliability score not be calculated?

There are a couple of scenarios where the intercoder reliability score cannot be calculated. They are all scenarios where not enough of the transcript has been coded. Here are the following scenarios where the intercoder reliability score cannot be calculated:

  1. Only one person (or nobody) has coded a transcript.

  2. Two or more people have coded a transcript, but their coded text does not overlap anywhere

  3. Two or more people have coded a transcript, but they have only used one code.

What are Krippendorff's Alpha Weights and how are they used in Delve?

In the generic Krippendorff's Alpha formula, there is a concept of "weights". Weights are essentially "partial credit", where coders may get an increase in score if they are "close enough".

Delve uses "identity" weighting, which means there is no "partial credit". The alpha score will only increase if the two coders coded the same segment in exactly the same way.

How does it impact the inter-rater reliability score, if a researcher codes a segment with more than one code?

We recommend that if you are conducting a strict intercoder reliability test, that your team should create a codebook where codes are conceptually separate. That way your team can take an approach where you only assign one code to a particular piece of text.

That being said our implementation of Krippendorff's Alpha can handle simultaneous coding, where you apply multiple codes to the same piece of text. As mentioned above, Delve uses "identity" weighting. So if one person codes a piece of text with two codes, another person will only be in agreement if they code the same piece of text with the same two codes.

Should my team use Intercoder Reliability?

No, your team should not necessarily use intercoder reliability. Not all qualitative researchers agree that intercoder reliability should be universally used. In their book "Thematic Analysis: A Practical Guide" Virginia Braun and Victoria Clarke state that they believe that coding reliability procedures such as inter-coder reliability can "lead to themes that are relatively superficial and underdeveloped" (page 240). Underpinning the concept of Intercoder Reliability is the idea that researcher subjectivity is an undesirable bias. On the contrary researcher subjectivity is viewed by many qualitative researchers as an inevitable and even valuable aspect of qualitative research, providing unique insights and depth to the analysis.

Instead of using intercoder reliability consider using Delve's code comparison feature where you can compare and discuss each other's work, without reducing your unique perspective to a quantitative score.

When should my team use intercoder reliability?

There are no hard and fast rules for when to use intercoder reliability. However, we will list a couple of scenarios where intercoder reliability could be helpful to your research analysis:

  1. Required for publication or reporting: Some journals or evaluative frameworks require or suggest the use of intercoder reliability.

  2. Training new researchers: Intercoder reliability can be a helpful way to quickly orient and assess a new member of your team. A high intercoder reliability score may show that the new researcher understands your team's codebook.

  3. Deductive Qualitative Analysis: Intercoder reliability tends to be more helpful for deductive qualitative analysis, where the codebook is predefined by a framework or existing theory. In contrast, when intercoder reliability is applied to inductive qualitative analysis, it has the danger of suppressing the process of generating new concepts and themes.

Did this answer your question?