Before starting this exercise, you should have completed
**all** the previous Absolute
Beginners’, Part 1 workshop exercises. Each section below indicates
which of the earlier worksheets are particularly relevant.

**Relevant worksheet:** Introduction to RStudio, Exploring data

Download this CSV file, which contains the
all the data you need for this worksheet. Then, create or open an
appropriate project on RStudio Server for this analysis
(**Plymouth University students**: use the project
‘psyc414’ created in the inter-rater reliability
worksheet), upload your
CSV to your project, and create a new R script called
`chi.R`

.

**Now, add these comments and commands to your script and run
them**; they will load the *tidyverse* package, and load
your data.

```
## Relationships
# Load package
library(tidyverse)
# Load data into 'friends'
friends <- read_csv("chi.csv")
```

Look at the data by clicking on it in the *Environment* tab in
RStudio. Each row is one participant in an interview about friendships.
Here’s what each of the columns in the data set contain:

Column | Description | Values |
---|---|---|

subj | Anonymous ID number of participant | a number |

age | Age of the participant | One of: “7 years”, “9 years”, “12 years”, “15 years” |

gender | Gender of the participant | One of: “male”, “female” |

culture | Culture of participant | One of: “China”, “East Germany”, “Iceland”, or “Russia” |

coded | How their interview response was coded | One of “activity”, “feelings”, “helping”, “length”, “norms”, or “trust” |

This is a large dataset comprising over 700 participants of different ages, genders, and cultures. It is based on, but not identical to, real data on this topic analysed by Michaela (Gummerum et al., 2008). An R script was used to generate these data from Michaela’s more complex data set.

Let’s start by looking at how often each of the coded responses
(i.e. *activities, feelings, helping, length, norms, and trust*)
appear in the interviews. We could do this by hand, but it would be slow
and error prone. Instead, we use the `table`

command in R to
do it for us.

**Add this comment and command to your script and run it
(CTRL+ENTER)**:

```
# Table 'coded' column of 'friends'
table(friends$coded)
```

```
activity feelings helping length norms trust
255 96 133 192 53 55
```

R gives us a table, which reports how often each of the coded
responses occurred in the data set. We can see that *activity*
was used the most, *norms* the least. In fact, *activity*
was used more than *feelings, norms, and trust* combined.

Here’s a step-by-step explanation of how the above command works. You’ll need this in a moment to calculate some frequency tables for yourself.

`table()`

- This command counts how many times each thing occurs (in this case, how often each type of coded response occurs).`friends$coded`

- We need to tell`table()`

where to find the data we are interested in. In this case, it’s the`coded`

column of the`friends`

*dataframe*that we loaded earlier. We tell R this by typing`friends$coded`

. Yes, that’s`$`

, the same symbol as we use to indicate US Dollars. However, it doesn’t mean “dollars” in R. It means column. So,`friends$coded`

means the`coded`

column of the`friends`

dataframe.

Now produce frequency tables for each of the other *variables*
in this *dataframe* (i.e. `age`

, `gender`

,
and `culture`

). You do this by changing the command
`table(friends$coded)`

so that it now refers to a different
column in the `friends`

dataframe. Re-read the above
*Explanation of command* section if you’re stuck.

```
# EXERCISE 1
# Table 'age' column of 'friends'
# Table 'gender' column of 'friends'
# Table 'culture' column of 'friends'
```

**Enter the above comments into your script, and fill in and
run the correct command underneath each comment.**

Do childrens’ ideas about friendship differ across cultures? We can
use the `table`

command to look at this, too. We use it to
produce a *frequency table* for each of the different cultures in
our sample, like this:

**Add the following comments and commands to your script and
run them:**

```
# Produce culture x coded contingency table, put into 'cont'
cont <- table(friends$culture, friends$coded)
# Display contingency table
cont
```

```
activity feelings helping length norms trust
China 47 33 52 28 28 8
East Germany 75 19 26 58 6 12
Iceland 78 17 13 62 6 20
Russia 55 27 42 44 13 15
```

Here’s an explanation of each part of that command:

`cont <-`

Store this table as`cont`

, so we can use it later. The command`<-`

stores the thing on its right in the thing on its left.`table(rows, columns)`

- The R command for producing tables. We replace the word`rows`

with the name of the variable we want to appear on the rows of the table, and we replace the word`columns`

with the name of the variable we want to appear in the columns of the table.`friends$culture`

- The`culture`

column of the`friends`

data frame. We’ve put this first in our`table`

command, so`culture`

appears as rows.`friends$coded`

- The`coded`

column of the`friends`

data frame. This appears second in our`table`

command, so`coded`

appears as columns.`cont`

- Lastly, we type`cont`

on its own to display the contingency table in the Console (clicking on`cont`

in the Environment tab in RStudio won’t work in this case).

R gives us a table, showing how many of each response were made in
each culture. This is called a *contingency table*. The name
*contingency table* comes from the word *contingent*, as
in, for example “Getting your degree is *contingent* on passing
your exams”. A contingency table gives the frequencies for one variable
(e.g. the interview responses) *contingent* on another variable
(e.g. the culture of the participants).

Close inspection of the contingency table reveals that, for example, the “helping” response is more common in China than in Iceland. The “activity” response is more common in Iceland than in Russia. So, it does look like childrens’ conceptions of friendship vary between cultures. Of course, not everyone in the same culture responded the same way but, overall, some types of response are more or less likely in some cultures than others.

Some people find it quite hard to notice these kinds of patterns in
contingency tables, and the patterns are certainly harder to spot in a
table than in a good visualization. The visualization we’re going to use
here is called a *mosaic plot*. The command to do this in R is as
follows:

**Add the following comment and command to your script and run
it**:

```
# Display mosaic plot of 'cont'
mosaicplot(cont)
```

It’s called a *mosaic* plot because it’s made up of
*tiles*.

In the above example, the *width* of each tile represents the
number of participants from each `culture`

. We collected data
from approximately the same number of people from each culture, so all
tiles are approximately the same width.

The *height* of each tile is determined by the frequency of
each of the responses (feelings, helping, etc.) within each culture –
the more common a response within a particular culture, the taller the
tile.

Looking at this mosaic plot, it’s visually obvious that “length” is a less common response in China than in other countries.

So, it looks like there’s some kind of relationship between culture
and conceptions of friendship … but how good is the evidence that this
is a real result, and not just some kind of fluke we can put down to
chance? As we covered in the *Evidence* worksheet, the best way
to answer this question is to calculate a Bayes Factor (BF). In R, we
can calculate the Bayes Factor for a contingency table like this:

**Add the following cooments and commands to your script and
run them**:

```
# Load the BayesFactor package
library(BayesFactor, quietly = TRUE)
# Calculate Bayes Factor for contingency table 'cont'
contingencyTableBF(cont, fixedMargin = "rows", sampleType = "indepMulti")
```

```
Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 107633530 ±0%
Against denominator:
Null, independence, a = 1
---
Bayes factor type: BFcontingencyTable, independent multinomial
```

The Bayes Factor is reported on the third line, towards the right.
The Bayes Factor in this example is about 107.6 *million*. This
means it’s more than 100 million times more likely that there is a
relationship between culture and friendship concepts, than there
isn’t.

Psychologists generally agree to believe the relationship is real if
the Bayes Factor exceeds 3, and generally agree to believe the
relationship is *not* real if the Bayes Factor is less than 0.33.
So, in this example, we have very strong evidence for the existence of a
relationship.

If you’re curious about what the rest of the output means, see more on relationships.

The first line,

`library(BayesFactor, quietly = TRUE)`

loads the*BayesFactor*package, which is a set of extra commands that allows R to calculate Bayes Factors.`contingencyTableBF()`

- The command for calculating a BF (Bayes Factor) for a contingency table.`cont`

- Our contingency table (we stored it in`cont`

earlier on in this worksheet).`fixedMargin = "rows", sampleType = "indepMulti"`

- This tells R that the different groups in your sample (in this case, different cultures) appear as the`rows`

of your contingency table. If you’d put them as the columns (e.g. if you’d used`table(friends$coded, friends$culture)`

then you would change this to`fixedMargin = "cols"`

. For a more detailed explanation, see more on relationships.

There’s a long history in psychology of performing a
*contingency-table chi-square* test to examine the level of
evidence for a relationship. The results of such tests are widely
misinterpreted by psychologists, but some still like to see them anyway.
Here’s how to calculate one for these data:

**Add the following comment and command to your script and run
it**:

```
# Calculate traditional chi-square test on 'cont' contingency table
chisq.test(cont)
```

```
Pearson's Chi-squared test
data: cont
X-squared = 89.169, df = 15, p-value = 1.417e-12
```

The key result here is the `p-value`

. It’s important to
emphasize that this *p value* is **not** the
probability that the observed relationship is due to chance. As we
covered in the *Evidence* worksheet, there is no way to explain
this *p value* that is simple, useful, and accurate.

Nonetheless, the convention is that if the *p value* is less
than 0.05, psychologists will generally believe you when you assert that
the relationship is not due to chance. If the *p value* is
greater than 0.05, they will generally be skeptical.

The *p value* in this example is very small, so has been
reported in *standard form*, and is read as 1.417 x
10^{-12}. You would have been taught standard notation in school
but, as a reminder, 1.417 x 10^{-12} = .000000000001417. See
this BBC
bitesize revision guide on standard form if you need a bit more
explanation than that.

The reported *p value* is less than .05 in this example, and
so psychologists will generally believe your result is real.

In addition to the *p value*, psychologists will generally
record at least two further numbers in their articles. The first is the
chi-square value, written as `X-squared`

in the above output,
but as \(\chi^2\) in articles.

The second is the *degrees of freedom* (`df`

in the
above output). In this case, *degrees of freedom* relates to the
size of the contingency table, and is the number of columns, minus one,
multiplied by the number of rows, minus one
(i.e. `(rows - 1) x (cols -1)`

).

In you were writing up this analysis in a report, you would write something like:

*The coded friendship concepts occurred with different frequency
across cultures, BF = 1.08 x 10 ^{8}, \(\chi^2\)(15) = 89.2, p < .001, see Table
1.*

“Table 1” would be the contingency table you’d produced with the
`table`

command.

As discussed in the *Evidence* worksheet, it is also important
to report the method by which you calculated your Bayes Factor. So,
somewhere in your report, you should say something like:

*Bayes Factors were calculated using the BayesFactor package
(Morey & Rouder, 2022), within the R environment (R Core Team,
2022).*

You can get the references for these citations by typing
`citation("BayesFactor")`

and `citation()`

.

Each step in this exercise can be completed by slightly modifying a command you have already used.

`#EXERCISE 2`

**Add these modified commands, along with the above comment, to
your script and run them**.

Here are the things you should do:

- Produce a contingency table that shows the relationship between
gender and concepts of friendship in this data set. Do this by modifying
`cont <- table(friends$culture, friends$coded)`

appropriately.

If your modified command still uses `cont`

, the commands
you used before should now work without having to modify them:

Produce a mosaic plot from this contingency table.

Calculate the Bayes Factor for the relationship.

**Enter your Bayes Factor into your lab book.**Perform a contingency chi-square test.

When you write up an experiment, you often need to provide some summary information about the sample, including the exact number of participants, and the gender balance. R makes it easy to work these things out, as this worksheet shows: sample characteristics.

For more detailed information on the analyses covered in this worksheet, see more on relationships.

This material is distributed under a Creative Commons licence. CC-BY-SA 4.0. It is part of Research Methods in R, by Andy Wills