This worksheet contains further information about the analysis of relationships.

You may have noticed that the *Relationships* worksheet did not include Bayes Factor or chi-square calculations for the effect of the child’s age. This is because these techniques, at least as covered in this worksheet, are intended for *unordered* variables only. Culture is an unordered (aka. “nominal”) variable because there is no order we can put China, Iceland and Russia in that is relevant to our investigation. Gender is also unordered. Age, however, is different; it has a clear order: 7, 9, 12, 15. It makes more sense to write the ages in this order than, for example, as: 12, 7, 9, 15.

It is not correct to use the Bayes Factor or chi-square techniques described in the *Relationships* worksheet on age, or any other ordered (aka. “ordinal”) data. For the analysis of ordered variables, see the next worksheet.

In the *Relationships* worksheet, you ran the following analysis.

```
cont <- table(friends$culture, friends$coded)
cont
```

```
activity feelings helping length norms trust
China 47 33 52 28 28 8
East Germany 75 19 26 58 6 12
Iceland 78 17 13 62 6 20
Russia 55 27 42 44 13 15
```

`contingencyTableBF(cont, fixedMargin = "rows", sampleType = "indepMulti")`

```
Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 107633530 ±0%
Against denominator:
Null, independence, a = 1
---
Bayes factor type: BFcontingencyTable, independent multinomial
```

Here’s a more detailed explanation of the part of the command we skimmed over in the *Relationships* worksheet:

`fixedMargin = "rows"`

- The number of children sampled from each culture was decided by the experimenter. This means that the total of each row in the contingency table was known before the childrens’ responses were coded. The jargon term for this is that the rows have *fixed marginal totals*. In contrast, the totals for each column were not known before the responses were coded, and would likely change a bit if a different sample was used. So, the columns do not have fixed marginal totals. In order to calculate the Bayes Factor, R needs to know whether it is the `rows`

or the `cols`

that are fixed. In this case, it is the rows.

`sampleType = "indepMulti"`

- If the rows are fixed and the columns are not (or vice versa), then the jargon term for the type of data you have is *independent multinomial* data – `indepMulti`

for short. R can also deal with other types of data, but we don’t cover them in this worksheet. The *Examples* section of Jamil et al. (2017) explains the four options that are available, and give examples of the sorts of data each is used for.

`Bayes factor analyis`

- We’re doing a Bayes Factor analysis!

`[1] Non-indep.`

- This Bayes Factor assesses *non-independence*, e.g. that friendship concepts and culture are not independent, they are related.

`(a=1)`

- In order to calculate a Bayes Factor, R has to make some assumptions. By setting `a`

to 1, you are saying that, prior to collecting the data, you would have expected every number in the contingency table to be about the same as every other number. This is sometimes called an *uninformative prior* because it’s basically saying you knew nothing about how these data were likely to turn out before you collected them. This is often a bit unrealistic, but we use it here because it’s relatively simple.

`107633530`

- The Bayes Factor (i.e. the main result of this analysis)

`±0%`

- Basically a confidence interval on the Bayes Factor. It’s 0% here because we have so much data, but with smaller samples we might see something like `20 ±5%`

, which would means the Bayes Factor is about 20, give or take 5%. So, it’s between 19 and 21.

`Against denominator:`

- This tells you that the null hypothesis is the denominator of the fraction that is used to calculate the Bayes Factor. So, in other words, you’re getting BF_{10} rather than BF_{01} – see the Evidence worksheet.

`Null, independence, a = 1`

- This tells you that the null hypothesis is that the two variables (e.g. culture and friendship response) are *independent* (i.e. there is no relationship between them). `a = 1`

means the same as it did when we last saw it (see above).

`Bayes factor type: BFcontingencyTable,`

- This reminds you which command you ran.

`independent multinomial`

- This reminds you what type of data you assumed (see `sampleType`

, above.)

The Bayes Factor, and chi-square, analyses only tell you that there is some kind of relationship between your two variables. They don’t tell you which part of the contingency table is driving that relationship.

For example, in the contingency table above, it looks like ‘helping’ is more common than ‘length’ in China, but it’s the other way around in the other countries. However, the BF and X^{2} analyses don’t tell you this, they just tell you there’s a relationship of some sort be country and response.

In you want to directly test your hyopthesis that China is different to the other countries on helping and length, you must reduce your contingency table down to just the relevant rows and columns. So, first, you’d filter to just ‘length’ and ‘helping’, using the `filter`

command covered in the *Group Differences* worksheet:

`length.helping <- friends %>% filter(coded == "length" | coded == "helping")`

If you click on `length.helping`

in the *Environment* tab of RStudio, you’ll see that we now only have these two types of responses.

You also need to simplify the data so it’s just China verus Other. We can use the `mutate`

command to do this, which we haven’t covered yet, but will cover in a later worksheet (*link to be inserted when worksheet is written*).

`length.helping <- length.helping %>% mutate( cult2 = ifelse(culture == "China", "China", "Other"))`

If you click on `length.trust`

in the *Enviornment* tab of RStudio, you’ll see that we now have a new column called `cult2`

that classifies countries as `China`

or `Other`

.

We can now re-do our analysis on this smaller contingency table:

```
cont <- table(length.helping$cult2, length.helping$coded)
cont
```

```
helping length
China 52 28
Other 81 164
```

`contingencyTableBF(cont, fixedMargin = "rows", sampleType = "indepMulti")`

```
Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 43797.69 ±0%
Against denominator:
Null, independence, a = 1
---
Bayes factor type: BFcontingencyTable, independent multinomial
```

The Bayes Factor for the analysis of this smaller table is well over 3, so we conclude that indeed Chinese children differ from the other cultures we investigated on their use of the ‘helping’ and ‘length’ categories.

In the *Group Differences* worksheet, we talked about effect size. For a difference between groups, the effect size is the difference in the group means, divided by the standard deviation. It gives a sense of how large the between-group effect is, relative to the within-group variability. We used the `cohen.d()`

command to calculate it.

We can also calculate effect size for contingency tables. The most commonly used measure in this case is Cramer’s Phi, which ranges between zero and one. We use the `cramer`

command in the *sjstats* package to calculate it:

```
library(sjstats)
cont
```

```
helping length
China 52 28
Other 81 164
```

`cramer(cont)`

`[1] 0.2798143`

**Note:** If you get an error here, try installing the package, `install.packages("sjstats")`

In this example, Cramer’s Phi is about 0.28. For a contingency table that has 2 rows and 2 columns, this is conventionally described as a “medium” effect size (with 0.1 being “small” and 0.5 being “large”). For larger tables, see this website.

Where there is no within-group variability, Cramer’s Phi is 1. Let’s illustrate this with a different contingency table. Don’t worry too much about the first three lines, they’re just a way of setting up a contingency table without having the raw data to base it on. This is not something you’ll need to do that often - it’s only a good idea to do this if you don’t have the raw data.

```
cont2 <- as.table(rbind(c(20,0), c(0,40)))
colnames(cont2) <- c("helping", "length")
rownames(cont2) <- c("China", "other")
cont2
```

```
helping length
China 20 0
other 0 40
```

`cramer(cont2)`

`[1] 1`

Where there is no between-group variability, Cramer’s Phi is zero. Here’s an illustration:

```
cont3 <- cont2
cont3[] <- as.table(rbind(c(10,10), c(20,20)))
cont3
```

```
helping length
China 10 10
other 20 20
```

`cramer(cont3)`

`[1] 0`

This material is distributed under a Creative Commons licence. CC-BY-SA 4.0. It is part of Research Methods in R, by Andy Wills.