## Before you start…

Before starting this exercise, you should have completed all the relevant Absolute Beginners’, Parts 1 & 2 worksheets. Each section below indicates which of the earlier worksheets are relevant.

## Getting the data into R

Relevant worksheet: Intro to RStudio

You and your partner must first complete the behaviour coding exercise. You’ll then get a CSV file that contains both your ratings.

Once you have your CSV file, open a project on RStudio Server for this analysis, create a script file, and upload your CSV to your project.

Plymouth University students: Create/open your project named psyc416; within that create a script file called lions.R. Enter all commands into that script and run them from there.

Relevant worksheet: Exploring data

library(tidyverse)
animals <- read_csv("animals.csv")

Note: Everyone’s CSV file will have a different name. In the example above, you’ll need to replace animals.csv with the name of your personal CSV file.

### Inspect

Look at the data by clicking on it in the Environment tab in RStudio. Each row is one time point in the video you coded. Here’s what each of the columns in the data set contain:

Column Description Values
time Time point. 1 - 10
period How long before feeding time? (in minutes). 10 or 180
behav.r1 Rater 1’s coding of the animal’s behaviour at each time point. In the example file, you’ll find: “pacing”, “sleeping”, “standing”, “lying”, “running”. Your codes may be different.
behav.r2 Rater 2’s coding of the animal’s behaviour at each time point. as above.
loc.r1 Rater 1’s coding of the animal’s location at each time point. In the example file, you’ll find: “zone_1”, “zone_2”, “zone_3”, “zone_4”. Your codes may be different.
loc.r2 Rater 2’s coding of the animal’s location at each time point. as above.

## Percentage agreement

Relevant worksheet: Inter-rater reliability

To what extent did you and your workshop partner agree on how each behaviour should be coded? As we covered in the inter-rater reliability worksheet, to look at this, we first have to select the relevant columns of the data frame. For example, to look at inter-rater reliability for the behaviour category, we select:

behav <- animals %>% select(behav.r1, behav.r2)

We can now use the agree command to work out percentage agreement:

library(irr)
agree(behav)
 Percentage agreement (Tolerance=0)

Subjects = 20
Raters = 2
%-agree = 70 

NOTE: If you get an error here, type install.packages("irr"), wait for the pacakge to finish installing, and try again.

The key result here is %-agree, which is your percentage agreement. The term Subjects here is a bit misleading, it doesn’t mean the number of animals you observed (this data file contains your ratings of one animal), it means the number of time points you recorded an observation for.

Enter the percentage agreement for your behaviour and location codings into PsycEL.

## Cohen’s kappa

Relevant worksheet: Inter-rater reliability

One problem with the percentage agreement measure is that people will sometimes agree purely by chance. Jacob Cohen thought it would be much neater if we could have a measure of agreement where zero always meant the level of agreement expected by chance, and 1 always meant perfect agreement. To calculate his mesaure, Cohen’s kappa, in R we use the command kappa2:

kappa2(behav)
 Cohen's Kappa for 2 Raters (Weights: unweighted)

Subjects = 20
Raters = 2
Kappa = 0.559

z = 4.17
p-value = 3.1e-05 

Enter the Cohen’s kappa values for your behaviour and location codings into PsycEL.

There are some words that psychologists sometimes use to describe the level of agreement between raters, based on the value of kappa they get. These descriptions are listed in the inter-rater reliability worksheet, in the section “Describing Cohen’s kappa”.

On PsycEL, select the correct term to describe the kappa values for your behaviour and location codings.

If either of those descriptions are ‘moderate’ or lower, reflect on why that might be. For example, is there a problem with the definitions of the behavioural categories you used? What else might have caused the lack of agreement?

Write a few sentences into PsycEL summarising your reflections.

## Behaviour and feeding time

Relevant worksheet: Relationships

Does the lion behave differently when it’s close to feeding time? To look at this, we need to calculate the frequency of each behaviour at our two time periods (10 minutes before feeding, and 180 minutes before feeding). You can use the table command we learned in the Relationships worksheet to do this, but you’re going to have to choose which behaviour, and which of your two raters, to look at. That’s because it’s likely you will have had at least a few disagreements. But if both of you were looking at the same behaviour, how can we decide who was ‘right’? There are a few possible solutions, but for now we will take the simplest: flip a coin to decide which of your raters’ data you will use.

If you choose “behaviour” and rater 2, the commands would be:

cont <- table(animals$period, animals$behav.r2)
cont

lying pacing sleeping standing
10      3      1        5        1
180     2      1        6        1

### Explanation of command

What you have just done here, as we covered in the relationships worksheet, is to convert your data frame, called animals, into a contingency table, called cont. This contingency table shows how often each behaviour occurs at each time period. Recall that table(rows, columns) is the command used in R for producing contingency tables. We replace the word rows with the name of the variable we want to appear on the rows of the table, and we replace the word columns with the name of the variable we wnat to appear in the columns of the table.

### Bar chart

Relevant worksheet: Face recognition

To visualise the relationship between behaviour and feeding time, we’re going to use a bar chart. We covered bar charts in the Face recognition worksheet; here we’re going to extend that example to create a bar chart that shows our two different time periods on the same axes.

df <- data.frame(cont)
colnames(df) <- c("Period", "Behaviour", "Frequency")
df %>% ggplot(aes(x = Behaviour, y = Frequency, fill = Period)) + geom_col(position="dodge")

### Explanation of command

This graph command goes a bit beyond what we’ve covered in previous worksheets, so here’s an explanation of how the new bits work:

df <- data.frame(cont) - A data frame is the standard way R stores data (e.g. animals is a data frame). The ggplot commands expects to get a data frame, and gets upset if it gets something else, like a contingency table. So, the first thing we do is make a data frame version of cont (our contingency table), and give it a name (df in this case).

If you click on df in the Environment tab of RStudio, you’ll see that the rows of the contingency table have been called “Var1” and the columns have been called “Var2”. These are not very meaningful labels, so we use the colnames command (short for “column names”) to give them more meaningful names. This will make our graph clearer. We do this using the command: colnames(df) <- c("Period", "Behaviour", "Freq").

ggplot(aes(x = Behaviour, y = Freq, fill = Period)) - As in previous bar graphs you’ve made, you need to tell ggplot which data is on the x axis, and which is on the y axis. The new bit here is that we also tell ggplot to produce two different colours of bars, with the colour depending on Period.

geom_col(position="dodge") - As before, geom_col is the command for a “column” plot (aka. a bar chart). The new part here is position="dodge"; this tells ggplot that you want the two different colours of bars to be placed side-by-side, rather than directly on top of each other (i.e. you want them to “dodge” each other).

In the above example graph, we can see that the animal was pacing and standing as often in the two time periods, but was lying slightly more 10 minutes before feeding time than 180 minutes before feeding time, and sleeping slightly more 180 minutes before feeding time than 10 minutes before. What do your data show? Did proximity to feeding time have an effect on behaviour? If so, which behaviours were most affected?

Enter a few sentences into PsycEL describing what your data show.