More on relationships, part 2.

This worksheet provides more detailed information on some of the concepts covered in the Relationships, part 2 worksheet.

Correlation co-efficients

In the main worksheet, we used this command to calculate a correlation co-efficient:

cor(gdata$ingroup, gdata$outgroup)

[1] 0.6641777

More specifically, this command calculates Pearson’s correlation co-efficient. If people don’t say what calculation they’ve done, they’ve almost certainly done this one.

Pearson’s calculation is a good choice if you expect the relationship between your two variables to be linear (i.e. a straight line), because the calculation assumes that the relationship is linear. However, if you think the relationship is not a straight line, then there are other options that are better. The best known one is Spearman’s correlation, which ranks the data in each variable separately, and then uses those ranks in Pearson’s calculation. In R, we calculate Spearman’s correlation like this:

cor(gdata$ingroup, gdata$outgroup, method="spearman")

[1] 0.4435664

Notice that this gives us a different answer to Pearson’s calculation (in this example, lower than Pearson’s, although in some cases it can be higher than Pearson’s).

A third calculation is also possible — Kendall’s correlation co-efficient. Kendall’s method is somewhat similar to Spearman’s, in that it is based on ranks. It is less used than the other two methods, and is not covered in this introductory course. If you need to calculate it, use method="kendall".

Bayes Factor analysis

In the main worksheet, we ran the following command:

library(BayesFactor, quietly = TRUE)
correlationBF(gdata$ingroup, gdata$outgroup)

Bayes factor analysis
--------------
[1] Alt., r=0.333 : 89.70525 ±0%

Against denominator:
  Null, rho = 0 
---
Bayes factor type: BFcorrelation, Jeffreys-beta*

Here’s a more detailed explanation of the output of that test – we’ll go through each bit in turn:

Bayes factor analysis - You’re doing a Bayes Factor analysis.

[1] Alt., r=0.333 - In order to calculate a Bayes Factor, R has to make some assumptions about how big the correlation is likely to be. The correlationBF command makes some broad assumptions that cover the range of correlation co-efficients typically seen in psychology.

More specifically, correlationBF assumes a beta distribution of correlation co-efficients, with a scale of 0.333 – this is where r=0.333 comes from. That description probably didn’t make much sense useless you have a very strong maths background, so here’s the same idea, shown as a density plot. It’s basically an assumption that a co-efficient of 0.5 is most likely, with both very large (> .9) and very small (< .1) co-efficients being quite unlikely.

89.70525 - The Bayes Factor (i.e. the main result of this analysis)

±0% - Basically a confidence interval on the Bayes Factor. It’s 0% here because of the amount of data we have, but with smaller samples we might see something like 20 ±5%, which would mean the Bayes Factor is about 20, give or take 5%. So, it’s between 19 and 21.

Against denominator: - This tells you that the null hypothesis is the denominator of the fraction that is used to calculate the Bayes Factor. So, in other words, you’re getting BF₁₀ rather than BF₀₁ – see the Evidence worksheet.

Null, rho = 0 - This tells you that the null hypothesis is that the correlation co-efficient (called rho here) is zero.

Bayes factor type: BFcorrelation, Jeffreys-beta - You’re doing a Bayes Factor analysis of correlation, using the beta function (see above) suggested by Jeffreys.

Traditional analysis

In the main worksheet, we ran the following command:

cor.test(gdata$ingroup, gdata$outgroup)


    Pearson's product-moment correlation

data:  gdata$ingroup and gdata$outgroup
t = 4.2608, df = 23, p-value = 0.0002939
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3647781 0.8390981
sample estimates:
      cor 
0.6641777

Here’s a more detailed explanation of the output of that test – we’ll go through each bit:

What you did

Pearsons' product-moment correlation - There’s more than one way to calculate a correlation (see above). This test uses Pearson’s method.

data: gdata$ingroup and gdata$outgroup - This just reminds you what data you’re analyzing, it’s basically a copy of what you told it to do, i.e. gdata$ingroup, gdata$outgroup

alternative hypothesis: true correlation is not equal to 0 - This is a way of saying that, before looking at the data, you made no assumptions about whether the correlation would be positive or negative. This is sometimes called a two-tailed test - see below if you can safely assume a direction before looking at your data (also called a one-tailed test).

What you found

t = 4.2608 - This is the t value – the output of a t-test. We’re using a t-test here in a special way, to calculate the significance of a correlation co-efficient. A t value isn’t at all useful on its own but along with the degrees of freedom (see below), we can use it to calculate the p value (also see below).

df = 23 - df is short for degrees of freedom. In a t-test, the degrees of freedom is the sample size, minus the number of means you’ve calculated from that sample (in this case, two means were calculated, one for each variable).

p-value = 0.0002939 - The is the p value of the t-test. It’s the probability of your data, under the assumption there is no correlation (sometimes called the null hypothesis). You need the t value and the degrees of freedom to be able to calculate the p value … but R does those calculations for you.

sample estimates:
      cor 
0.6641777

The part above just tells you that the correlation co-efficient is 0.6641777 (you could have also got this number using the cor command).

95 percent confidence interval:
 0.3647781 0.8390981

This 95% confidence interval tells us that the true value of the correlation in the population is very likely to be somewhere between about .36 and .84. If we had collected more data we could have been more precise.

The 95% confidence interval is only thing reported by a traditional analysis that is both useful and easy to interpret. Psychologists are now encouraged to report it in their papers, like this:

ingroup closeness correlated with outgroup distance, r = .66 [.36, .84], t(23) = 4.26, p < .05.

One-tailed tests

In a one-tailed test, you decide before looking at your data which direction the effect should be in. For example, you may have read a lot of scientific papers about group relations, so you’re pretty sure that if you find a correlation between ingroup closeness and outgroup distance, that correlation will be positive.

If this case, you’d use this command to do this one-tailed test:

cor.test(gdata$ingroup, gdata$outgroup, alternative = "greater")

If instead your hypothesis was that the correlation would be negative, you’d use this command instead:

cor.test(gdata$ingroup, gdata$outgroup, alternative = "less")

This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.