Before starting this exercise, you should have completed all the relevant Absolute Beginners’, Part 1 worksheets. Each section below indicates which of the earlier worksheets are relevant.
Relevant worksheet: Using RStudio projects
In this excercise, you’ll be analysing some data that you and your peers recently collected. To get this data into R, follow these steps:
Set up an RStudio project for this analysis.
Upload the CSV file you have been given for this activity into your RStudio project folder. If you want to try out this worksheet without that data file, you can use this example CSV file instead. You can only complete your PsycEL activity if you use the CSV file you were sent.
Load the tidyverse package, and then load your data into R.
library(tidyverse) data <- read_csv("green.csv")
Note: In the example above, you’ll need to replace
green.csv with the name of the CSV file you just uploaded into your RStudio project.
Look at the data by clicking on it in the Environment tab in RStudio.
Each row is a rating by one participant in this study of creativity. Groups of participants came up with a creative solution to a problem, while either taking a walk in an urban environment or a nature environment. Each of these solutions has been rated for creativity by a set of raters.
Will the nature environment lead to more creative ideas than the urban environment?
|Solution||Number of the Solution||a number|
|Rater||Reference number of the person rating the solution||a number|
|Cond||Which environment was the creator in?||“Urban”, “Nature”|
|score||How creative was the idea rated to be?||0-100, higher numbers = more creative|
Relevant worksheet: Group Differences
We start by “pre-processing” our data, in order to make it easier to analyse. We do this in two steps:
In some cases, a participant did not provide a rating of a solution – this is then represented in the dataset as
NA. R uses
NA to specify that this data point is missing – in this case, because the participant didn’t respond.
Although it’s good to explicitly record that a response was not made, keeping these
NA in the dataset will cause problems later on, so we’re going to remove them:
data <- data %>% drop_na(score)
drop_na(score) is new – it just means remove the rows of the dataset where the score is recorded as
NA. The rest of the command uses things we covered in the Group Differences worksheet – the dataframe
data is sent (i.e., piped,
%>%) to the
drop_na() command, which removes the
NA, and the results are stored (
<-) back in the
Each solution was rated by several people. We’re going to take the average (mean) of those ratings, so we’re left with one creativity score per solution. We use the
mean commands we used in the Group Differences worksheet to do this:
creative <- data %>% group_by(Cond, Solution) %>% summarise(score = mean(score))
`summarise()` regrouping output by 'Cond' (override with `.groups` argument)
As before, you can safely ignore the “ungrouping” message that you receive.
If we look at this summarised data, by clicking on the Environment tab of RStudio, we can see that we now have one creativity score per solution.
We start by looking to see how the mean creativity scores differ for those who were in a nature or an urban environment. We can do this using the
summarise functions in a similar way to before, but on our preprocessed data, which we have stored in the data frame
creative %>% group_by(Cond) %>% summarise(mean(score))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2 Cond `mean(score)` <chr> <dbl> 1 Nature 42.7 2 Urban 39.2
Your output will look similar to this, but the numbers will probably be different. In this example, it looks like there’s a small difference, with the creativity ratings slightly higher in the Nature environment – but how does this between-group difference compare to the within-group variability? As we covered in the Group Differences worksheet, this is most easily looked at with a scaled density plot:
creative %>% ggplot(aes(score, colour=factor(Cond))) + geom_density(aes(y=..scaled..)) + xlim(0, 100)
Explanation of command: The only new part here is
xlim(0, 100), which sets limits on the x-axis of your graph. Specifically, it forces the lowest value on the x-axis to be 0 and the highest value to be 100. Without
xlim, R chooses limits that it thinks are sensible. Like all computer programs, R isn’t that bright, so often it makes sense to tell it more precisely what you want.
In this example, the graph tells a somewhat different story to the means - although a difference between groups is visible, it is small compared to the variability within each group.
We can express the size of the difference in means, relative to the within-group variability, as an effect size. As we said in the Group Differences worksheet, we calculate an effect size in R like this:
library(effsize) cohen.d(creative$score ~ creative$Cond)
Cohen's d d estimate: 0.2145072 (small) 95 percent confidence interval: lower upper -0.5973451 1.0263596
In this example, the effect size is around 0.21, which is typically described as a small effect. The effect size for your data may be different.
At this point, the most pressing question is probably whether the difference observed in the mean scores is likely to be real, or whether it’s more likely down to chance. As we saw in the Evidence worksheet, the best way to look at this is with a Bayesian t-test:
library(BayesFactor, quietly = TRUE) ttestBF(formula = score ~ Cond, data = data.frame(creative))
Bayes factor analysis --------------  Alt., r=0.707 : 0.4053154 ±0% Against denominator: Null, mu1-mu2 = 0 --- Bayes factor type: BFindepSample, JZS
The Bayes Factor in this case is approximately a 1/2 (0.41 to be more precise), meaning it’s about twice as likely there isn’t a difference as there is. Your number will likely be a bit different.
Enter the mean creativity score for each condition, the effect size, and the Bayes Factor for the difference, into PsycEL.
Using the convention that there is a difference if BF > 3, there isn’t a difference if BF < 0.33, and if it’s between 0.33 and 3, we’re unsure, select difference, no difference, or unsure, on PsycEL.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.