More on relationships

This worksheet contains further information about the analysis of relationships.

Ordered variables
Bayes Factor analysis
Interpreting large contingency tables
Effect size for contingency tables

Ordered variables

You may have noticed that the Relationships worksheet did not include Bayes Factor or chi-square calculations for the effect of the child’s age. This is because these techniques, at least as covered in this worksheet, are intended for unordered variables only. Culture is an unordered (aka. “nominal”) variable because there is no order we can put China, Iceland and Russia in that is relevant to our investigation. Gender is also unordered. Age, however, is different; it has a clear order: 7, 9, 12, 15. It makes more sense to write the ages in this order than, for example, as: 12, 7, 9, 15.

It is not correct to use the Bayes Factor or chi-square techniques described in the Relationships worksheet on age, or any other ordered (aka. “ordinal”) data. For the analysis of ordered variables, see the next worksheet.

Bayes Factor analysis

In the Relationships worksheet, you ran the following analysis.

cont <- table(friends$culture, friends$coded)
cont

              
               activity feelings helping length norms trust
  China              47       33      52     28    28     8
  East Germany       75       19      26     58     6    12
  Iceland            78       17      13     62     6    20
  Russia             55       27      42     44    13    15

contingencyTableBF(cont, fixedMargin = "rows", sampleType = "indepMulti")

Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 107633530 ±0%

Against denominator:
  Null, independence, a = 1 
---
Bayes factor type: BFcontingencyTable, independent multinomial

Further explanation of command

Here’s a more detailed explanation of the part of the command we skimmed over in the Relationships worksheet:

fixedMargin = "rows" - The number of children sampled from each culture was decided by the experimenter. This means that the total of each row in the contingency table was known before the childrens’ responses were coded. The jargon term for this is that the rows have fixed marginal totals. In contrast, the totals for each column were not known before the responses were coded, and would likely change a bit if a different sample was used. So, the columns do not have fixed marginal totals. In order to calculate the Bayes Factor, R needs to know whether it is the rows or the cols that are fixed. In this case, it is the rows.

sampleType = "indepMulti" - If the rows are fixed and the columns are not (or vice versa), then the jargon term for the type of data you have is independent multinomial data – indepMulti for short. R can also deal with other types of data, but we don’t cover them in this worksheet. The Examples section of Jamil et al. (2017) explains the four options that are available, and give examples of the sorts of data each is used for.

Further explanation of output

Bayes factor analyis - We’re doing a Bayes Factor analysis!

[1] Non-indep. - This Bayes Factor assesses non-independence, e.g. that friendship concepts and culture are not independent, they are related.

(a=1) - In order to calculate a Bayes Factor, R has to make some assumptions. By setting a to 1, you are saying that, prior to collecting the data, you would have expected every number in the contingency table to be about the same as every other number. This is sometimes called an uninformative prior because it’s basically saying you knew nothing about how these data were likely to turn out before you collected them. This is often a bit unrealistic, but we use it here because it’s relatively simple.

107633530 - The Bayes Factor (i.e. the main result of this analysis)

±0% - Basically a confidence interval on the Bayes Factor. It’s 0% here because we have so much data, but with smaller samples we might see something like 20 ±5%, which would means the Bayes Factor is about 20, give or take 5%. So, it’s between 19 and 21.

Against denominator: - This tells you that the null hypothesis is the denominator of the fraction that is used to calculate the Bayes Factor. So, in other words, you’re getting BF₁₀ rather than BF₀₁ – see the Evidence worksheet.

Null, independence, a = 1 - This tells you that the null hypothesis is that the two variables (e.g. culture and friendship response) are independent (i.e. there is no relationship between them). a = 1 means the same as it did when we last saw it (see above).

Bayes factor type: BFcontingencyTable, - This reminds you which command you ran.

independent multinomial - This reminds you what type of data you assumed (see sampleType, above.)

Interpreting large contingency tables

The Bayes Factor, and chi-square, analyses only tell you that there is some kind of relationship between your two variables. They don’t tell you which part of the contingency table is driving that relationship.

For example, in the contingency table above, it looks like ‘helping’ is more common than ‘length’ in China, but it’s the other way around in the other countries. However, the BF and X² analyses don’t tell you this, they just tell you there’s a relationship of some sort be country and response.

In you want to directly test your hyopthesis that China is different to the other countries on helping and length, you must reduce your contingency table down to just the relevant rows and columns. So, first, you’d filter to just ‘length’ and ‘helping’, using the filter command covered in the Group Differences worksheet:

length.helping <- friends %>% filter(coded == "length" | coded == "helping")

If you click on length.helping in the Environment tab of RStudio, you’ll see that we now only have these two types of responses.

You also need to simplify the data so it’s just China verus Other. We can use the mutate command to do this, which we haven’t covered yet, but will cover in a later worksheet (link to be inserted when worksheet is written).

length.helping <- length.helping %>% mutate( cult2 = ifelse(culture == "China", "China", "Other"))

If you click on length.trust in the Enviornment tab of RStudio, you’ll see that we now have a new column called cult2 that classifies countries as China or Other.

We can now re-do our analysis on this smaller contingency table:

cont <- table(length.helping$cult2, length.helping$coded)
cont

       
        helping length
  China      52     28
  Other      81    164

contingencyTableBF(cont, fixedMargin = "rows", sampleType = "indepMulti")

Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 43797.69 ±0%

Against denominator:
  Null, independence, a = 1 
---
Bayes factor type: BFcontingencyTable, independent multinomial

The Bayes Factor for the analysis of this smaller table is well over 3, so we conclude that indeed Chinese children differ from the other cultures we investigated on their use of the ‘helping’ and ‘length’ categories.

Effect size for contingency tables

In the Group Differences worksheet, we talked about effect size. For a difference between groups, the effect size is the difference in the group means, divided by the standard deviation. It gives a sense of how large the between-group effect is, relative to the within-group variability. We used the cohen.d() command to calculate it.

We can also calculate effect size for contingency tables. The most commonly used measure in this case is Cramer’s Phi, which ranges between zero and one. We use the cramer command in the sjstats package to calculate it:

library(sjstats)
cont

       
        helping length
  China      52     28
  Other      81    164

cramer(cont)

[1] 0.2798143

Note: If you get an error here, try installing the package, install.packages("sjstats")

In this example, Cramer’s Phi is about 0.28. For a contingency table that has 2 rows and 2 columns, this is conventionally described as a “medium” effect size (with 0.1 being “small” and 0.5 being “large”). For larger tables, see this website.

Where there is no within-group variability, Cramer’s Phi is 1. Let’s illustrate this with a different contingency table. Don’t worry too much about the first three lines, they’re just a way of setting up a contingency table without having the raw data to base it on. This is not something you’ll need to do that often - it’s only a good idea to do this if you don’t have the raw data.

cont2 <- as.table(rbind(c(20,0), c(0,40)))
colnames(cont2) <- c("helping", "length")
rownames(cont2) <- c("China", "other")
cont2

      helping length
China      20      0
other       0     40

cramer(cont2)

[1] 1

Where there is no between-group variability, Cramer’s Phi is zero. Here’s an illustration:

cont3 <- cont2
cont3[] <- as.table(rbind(c(10,10), c(20,20)))
cont3

      helping length
China      10     10
other      20     20

cramer(cont3)

[1] 0

This material is distributed under a Creative Commons licence. CC-BY-SA 4.0. It is part of Research Methods in R, by Andy Wills.