In this brief extension worksheet, we look at why kappa is sometimes much lower than percentage agreement, and also why the `kappa2`

command sometimes prints `NaN`

for *Z* and *p*.

To illustrate these things, here are some example ratings, and the output they produce:

```
# A tibble: 5 x 3
subject rater1 rater2
<int> <int> <int>
1 1 3 3
2 2 3 4
3 3 3 3
4 4 3 3
5 5 3 3
```

```
Percentage agreement (Tolerance=0)
Subjects = 5
Raters = 2
%-agree = 80
```

```
Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 5
Raters = 2
Kappa = 0
z = NaN
p-value = NaN
```

It might seem odd to have a kappa of zero here, because the percentage agreement is quite high (80%). Recall that Cohen’s kappa is calculated as:

*(P - C) / (100 - C)*

where *P* is the percentage agreement between the two raters, and *C* is the percentage agreement we’d expect by chance. So, for kappa to be zero, the percentage agreement by chance must also be 80%.

Agreement by chance is so high here because Rater 1 is using the same response all the time, and Rater 2 is using that same response 80% of the time. If one person always makes the same rating, and the other makes that rating on a random 80% of occasions, they’ll agree 80% of the time. For example, if I call everything I see a cat, and you call everything you see a cat unless you roll a five on your five-sided dice, we’ll agree 80% of the time. This does not mean either of us knows what a cat is.

`NaN`

In this case, `NaN`

doesn’t mean grandmother, it means ‘not a number’. What’s happened here is that there is so little variation in the ratings used (they are nearly all ‘3’) that R cannot calculate the Z score or the p value — the calculations that are used to do this break down in these extreme cases.

This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.