In this brief extension worksheet, we look at why kappa is sometimes much lower than percentage agreement, and also why the kappa2 command sometimes prints NaN for Z and p.

To illustrate these things, here are some example ratings, and the output they produce:

# A tibble: 5 x 3
  subject rater1 rater2
    <int>  <int>  <int>
1       1      3      3
2       2      3      4
3       3      3      3
4       4      3      3
5       5      3      3
 Percentage agreement (Tolerance=0)

 Subjects = 5 
   Raters = 2 
  %-agree = 80 
 Cohen's Kappa for 2 Raters (Weights: unweighted)

 Subjects = 5 
   Raters = 2 
    Kappa = 0 

        z = NaN 
  p-value = NaN 

Why is kappa zero in this case?

It might seem odd to have a kappa of zero here, because the percentage agreement is quite high (80%). Recall that Cohen’s kappa is calculated as:

(P - C) / (100 - C)

where P is the percentage agreement between the two raters, and C is the percentage agreement we’d expect by chance. So, for kappa to be zero, the percentage agreement by chance must also be 80%.

Agreement by chance is so high here because Rater 1 is using the same response all the time, and Rater 2 is using that same response 80% of the time. If one person always makes the same rating, and the other makes that rating on a random 80% of occasions, they’ll agree 80% of the time. For example, if I call everything I see a cat, and you call everything you see a cat unless you roll a five on your five-sided dice, we’ll agree 80% of the time. This does not mean either of us knows what a cat is.

Why are Z and p equal to NaN

In this case, NaN doesn’t mean grandmother, it means ‘not a number’. What’s happened here is that there is so little variation in the ratings used (they are nearly all ‘3’) that R cannot calculate the Z score or the p value — the calculations that are used to do this break down in these extreme cases.


This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.