Contents

Getting started

Return to the project that you used for the understanding interactions worksheet, which you named rminr-data. Open a new blank script and save it as anova3.R. Put all commands in that script, and save the script regularly.

This worksheet makes use of the preprocessed MCwordsCIsum data frame we made in the understanding interactions worksheet. Start by running the following commands to make sure you have this data frame in your environment.

Enter these commands into your R script and run them:

library(tidyverse)
library(BayesFactor)
words <- read_csv("wordnaming2.csv")
MCwords <- words %>% filter(medit != "relax")
MCwordsCI <- MCwords %>% filter(congru != "neutral")
MCwordsCIsum <-
    MCwordsCI %>% group_by(subj, medit, congru) %>% summarise(rt = mean(rt))

Introduction

In this worksheet, we’ll use the anovaBF command to analyse the data from an experiment with a factorial design. In the understanding interactions worksheet, we said that a factorial experiment design is one where there are at least two factors (e.g. training type, trial type) and we have data for all combinations of those factors (e.g. meditate-congruent, meditate-incongruent, control-congruent, control-incongruent). When you have a factorial design with two factors, there are three questions you can ask:

  1. Is there a main effect of the first factor (e.g. training type)?

  2. Is there a main effect of the second factor (e.g. trial type)?

  3. Is there an interaction between these two factors?

The command anovaBF allows us to answer all these three questions in one go.

Defining factors

Take a look at the data frame MCwordsCIsum. It gives the mean congruent reaction time, and mean incongruent reaction time, for each participant. As before, you have to tell R which of the columns of this data frame are factors. The columns medit and congru are factors - they are the two factors of the design. Recall that the participant ID is also a factor. So, you need to set three of the four columns of MCwordsCIsum as factors, as follows.

Enter these commands into your R script and run them:

MCwordsCIsum$subj <- factor(MCwordsCIsum$subj)
MCwordsCIsum$medit <- factor(MCwordsCIsum$medit)
MCwordsCIsum$congru <- factor(MCwordsCIsum$congru)

Bayesian factorial ANOVA

Once you have told R which columns are factors, you’re ready to answer these three questions:

  1. Is there a main effect of the first factor (e.g. training type)?

  2. Is there a main effect of the second factor (e.g. trial type)?

  3. Is there an interaction between these two factors?

This is a big calculation for R, so run the following commands, and then read the explanation while you’re waiting for the results. It could take up to a minute to get the answer.

Enter this command into your R script and run it:

bf <- anovaBF(formula = rt ~ medit*congru + subj,
              data = data.frame(MCwordsCIsum), whichRandom = "subj")

Explanation of command

We’ll look at each part of this command but, rather than looking at each in turn, we’ll look at the crucial part first, which is this:

Formula

formula = rt ~ medit*congru + subj

This tells anovaBF what type of ANOVA we want. It says that the dependent variable is the reaction time column rt, and that the independent variables are the training type medit and the trial type congru. The * means “do a factorial ANOVA”. In other words, it means work out a Bayes Factor for the main effect of each factor, and for the interaction. The final part, + subj, is something we always include in anovaBF, and just says that the participant IDs are to be found in the subj column.

Random factors

You also have to tell anovaBF which factors are random factors. In most cases we’ll deal with in this guide, only the participant ID is a random factor. Hence:

whichRandom = "subj"

Storing results in an object

bf <-

As the calculation takes a while to run, it makes sense to save the results of the calculation, so we only ever have to do it once. This command writes the output of anovaBF to an object called bf. Until now, most objects you’ve seen have been data frames. But there are other types of object in R including ones, like this one, that contain the results of calculations.

Everything else

anovaBF - This is the same command as we’ve used for a while now. It’s quite flexible!

data = data.frame(MCwordsCIsum) - As in previous examples of anovaBF, this just says where to find the data (i.e. in the MCwordsCIsum data frame).

Explanation of output

Once calculation has finished, take a look at the results. You do this by typing in the name of the object you stored the results in:

bf
Bayes factor analysis
--------------
[1] medit + subj                         : 349.7556   ±1.24%
[2] congru + subj                        : 788.9524   ±1.99%
[3] medit + congru + subj                : 329271.8   ±3.85%
[4] medit + congru + medit:congru + subj : 1744491117 ±1.72%

Against denominator:
  rt ~ subj 
---
Bayes factor type: BFlinearModel, JZS

The exact figures in your output may be slightly different to those shown.

The key parts of this output are the Bayes Factors, which are the numbers immediately after the colons. A couple of things to notice here. First, there are four Bayes factors, when we might have expected to see three. We’ll come back to that later. Second, there are things like ±2.66% after the Bayes Factors. This means that R has estimated the Bayes Factors, and is able to tell you how accurate that estimate is. So, for example, 10 ± 10% means the Bayes Factor is somewhere between 9 and 11.

We’ll look at each of these Bayes factors in turn. Notice they are numbered, e.g. [1] - this will become useful later. Also notice that they all include + subj. This just reminds us that we’ve told anovaBF which data comes from which participant, so we’ll ignore the + subj in our descriptions from here on.

[1] medit

This is the main effect of training type, medit. More specifically, it is a test of the hypothesis that medit, affects rt.

Recall that a Bayes Factor is a comparison of evidence of two hypotheses. So, in a Bayesian t-test, for example, a BF of 3 means there is three times as much evidence in favour of a difference between groups (the experimental hypothesis) as in favour of their not being a difference (the null hypothesis).

The Bayes Factor for this hypothesis is around 340, which is really strong evidence that there is a main effect of meditation training. But which direction is the main effect in? Does meditation training make you faster overall, or slower overall?

We can find this out by using the group_by and summarise commands we have used several times before.

Enter this command into your R script and run it:

MCwordsCIsum %>% group_by(medit) %>% summarise(mean(rt))
# A tibble: 2 x 2
  medit    `mean(rt)`
  <fct>         <dbl>
1 control        520.
2 meditate       488.

From this, we can see that people in the meditation condition are about 30 ms faster, on average, than those in the control condition. Meditation training seems to have led to a reduction in overall reaction times in this task.

[2] congru

This is the main effect of congruence. More specifically, it is the hypothesis that congru affects rt. This is compared against the hypothesis that there is no effect of congruence.

The Bayes Factor is about 860, strong evidence for a main effect of congruence. As before, we can find out which direction this effect is in.

Enter this command into your R script and run it:

MCwordsCIsum %>% group_by(congru) %>% summarise(mean(rt))
# A tibble: 2 x 2
  congru `mean(rt)`
  <fct>       <dbl>
1 cong         490.
2 incong       518.

As expected, incongruent trials are slower than congruent trials, on average.

The interaction

anovaBF does not directly give us a Bayes Factor for the interaction of the two main effects. Instead, it gives us two Bayes Factors for things that we can use to work out the interaction BF. These are:

[3] medit + congru

This the hypothesis that there is a main effect of both factors. There is no assumption that the two main effects are of the same size. This ‘main effects’ hypothesis is compared against the null hypothesis, i.e. that neither medit nor congru affect rt.

The BF for this hypothesis is large (about 300,000). We’d expect this, given that there was substantial evidence for both congru alone and medit alone.

[4] medit + congru + medit:congru

This is the Bayes Factor for the hypothesis that there are main effects for both factors (medit + congru) and that the two factors interact (+ medit:congru). This is again compared against the null hypothesis that neither medit nor congru have any effect.

The BF for this hypothesis is also large (about \(1.8 \times 10^{9}\)). We’d expect this, because there was substantial evidence for the ‘main effects’ hypothesis medit + congru.

Interaction BF

Remember that a Bayes Factor is always a comparison of two types of evidence. So far, we’ve always compared some experimental hypothesis with the null hypothesis. But we can also compare the evidence for two different experimental hypotheses. The hypothesis that there is an interaction is the hypothesis that there is something more going on that just the combination of main effects. For example, the hypothesis that the congruency effect is smaller after mediation. So, to get a Bayes Factor for the interaction, we compare the evidence for hypothesis [4] with the evidence for hypothesis [3]. To do this, we divide the Bayes Factor for [4] by the Bayes Factor for [3].

In R we can use the bf object to do this.

Enter this command into your R script and run it:

bf[4] / bf[3]
Bayes factor analysis
--------------
[1] medit + congru + medit:congru + subj : 5298.027 ±4.22%

Against denominator:
  rt ~ medit + congru + subj 
---
Bayes factor type: BFlinearModel, JZS

This gives us a Bayes Factor for the interaction close to 6000. So, there is strong evidence for the interaction, too.

Exercise

As covered in the within-subject differences, one of the strengths of anovaBF is that it’s not limited to factors with two levels. In our meditation experiment, there were three between-subjects training conditions (meditate, relaxation, none), and three within-subjects trial types (congruent, incongruent, neutral). The goal of this exercise is to write an R script to analyse the full experiment.

Below are further instructions. In some cases, there are also examples of what your output should look like at each stage if you’ve got it right (recall that your BF may be slightly different to those shown). Once your script is working correctly, copy and paste it into PsycEL.

Start a new R script in your current project (rminr-data), call it anova3ex.R and begin it with rm(list=ls()) to make sure you’ve cleared everything from your environment. Then, write R commands to do the following (see below). Only include the commands that are needed to do this, and use meaningful names for your variables.

  1. Load packages and data,

  2. Preprocess that data and place it in a data frame called wordsum

`summarise()` has grouped output by 'subj', 'medit'. You can override using the `.groups` argument.
  1. Tell R which columns of your preprocessed data are factors

  2. Calculate a factorial Bayesian ANOVA for this full 3 (meditate, relax, control) x 3 (congruent, incongruent, neutral) data set, and store it in an object (e.g. bfall)

  3. Show the results of that calculation

Bayes factor analysis
--------------
[1] medit + subj                         : 825.4065     ±0.72%
[2] congru + subj                        : 2.873857e+16 ±2.43%
[3] medit + congru + subj                : 2.409673e+19 ±1.38%
[4] medit + congru + medit:congru + subj : 1.253528e+27 ±3.88%

Against denominator:
  rt ~ subj 
---
Bayes factor type: BFlinearModel, JZS
  1. Calculate the BF for the interaction.
Bayes factor analysis
--------------
[1] medit + congru + medit:congru + subj : 52020688 ±4.12%

Against denominator:
  rt ~ medit + congru + subj 
---
Bayes factor type: BFlinearModel, JZS
  1. Report the mean RTs for the main effect of each condition.
# A tibble: 3 x 2
  medit    `mean(rt)`
  <fct>         <dbl>
1 control        520.
2 meditate       488.
3 relax          517.
# A tibble: 3 x 2
  congru  `mean(rt)`
  <fct>        <dbl>
1 cong          491.
2 incong        526.
3 neutral       508.
  1. Calculate the mean RT for each of the nine conditions of the experiment, and place these data in a new data frame so you can graph them:
`summarise()` has grouped output by 'medit'. You can override using the `.groups` argument.
# A tibble: 9 x 3
# Groups:   medit [3]
  medit    congru     rt
  <fct>    <fct>   <dbl>
1 control  cong     492.
2 control  incong   548.
3 control  neutral  519.
4 meditate cong     489.
5 meditate incong   487.
6 meditate neutral  489.
7 relax    cong     492.
8 relax    incong   543.
9 relax    neutral  517.
  1. Graph

Use a line graph, similar to the ones you produced earlier, to show these nine means. Note that ggplot orders conditions alphabetically, so by default your x-axis would come out in the order cong, incong, neutral. It makes more sense to use the order cong, neutral, incong as the neutral trials (those without pictures) are, in some sense, in between the congruent trials (helpful pictures) and the incongruent trials (unhelpful pictures). You can reorder the points on the x-axis of a graph using the scale_x_discrete command. In this case, you can set the correct order by adding this to your graph commands:

scale_x_discrete(limits=c("cong", "neutral", "incong"))

Pairwise comparisons

Although that’s the end of the exercise, in a full analysis you would probably want to go further and look at particular pairs of conditions within the experiment. You do this as before, by using the filter command to select the data you want to analyse. For more details, see the within-subject differences worksheet.

Better graphs

The graph you produced in the above exercise is quite common in psychology experiments, but it’s not particularly informative because it gives only the average response for each condition. It gives no sense of how much people vary from one another.

One common approach to this problem is to add error bars, or confidence intervals to a line plot or bar plot. These are ways of showing variability around the mean with a little “I” shaped mark and although we don’t cover them in this course, you’ll see them in a lot of journal articles.

Unfortunately, very many of those articles use them wrongly for within-subjects factors, because they use the variability of each level of the factor to calculate the confidence intervals. This is wrong because it is the variability of the differences between conditions that are relevant in a within-subjects design, not the variability of the individual conditions. The former will often be smaller than the latter. Indeed, this difference is one of the main reasons to favour within-subjects designs, see here for more details.

A better way of showing variability in a within-subjects factor is to use a density plot of the differences, as we have in these worksheets. Where you also have a between-subjects factor, you can overlay those difference-density plots, as in this figure. You can alternatively use boxplots or violin plots, which some people find more attractive and/or easier to interpret.

In cases where your within-subjects factor has more than two levels, this can get more tricky but generally the solution is to pick those pairs of levels that are important for your hypothesis, and do difference-density plots of those.

Reporting Bayesian ANOVA

When it comes to reporting the results of a Bayesian ANOVA, you just give the Bayes Factor in the appropriate part of your text. For example:

There was a main effect of training type, \(BF = 814\), a main effect of congruency, \(BF = 2.8 \times 10^{16}\), and an interaction between these two factors, \(BF = 5.4 \times 10^{7}\).

It’s important to remember that Bayes Factors without means are basically meaningless. So, you need to show, for example, a graph of the means for each condition (as above) for the reader to make sense of your analysis.

There are a number of different ways to do Bayesian calculations, and these can lead to somewhat different results. So, it’s really important to also say exactly what calculation you did. In this case you would say:

We performed a factorial Bayesian ANOVA with one within-subjects factor (congruency) and one between-subject factor (training type), using the BayesFactor package (Morey & Rouder, 2018) in R (R Core Team, 2019).

It’s also important to include those references in your Reference section. R will tell you the reference for a package if you type e.g. citation("BayesFactor"). The reference for R itself is found by typing citation(). Note that R is what is doing your calculations, while RStudio is a web page that makes it easier to use R. RStudio does not have any affect on the output you get. So, you don’t normally cite RStudio in your writeups.

Ordinal factors

A final important thing to realise about ANOVA is that it does not care about the order of the levels in your factors. For example, the full data set you have been analyzing includes block number. Participants do 30 trials, then take a break, then do another 30 trials, and so on. So each response is either in block 1, 2, or 3. If people were getting tired, you might see reaction times rise from blocks 1 to 2 and again from blocks 2 to 3. You might also find Bayesian evidence for a main effect of block (e.g. BF = 30).

It’s important to realise that this ANOVA Bayes Factor tells you only that the three groups differ, not that 3 is greater than 2 and 2 is greater than 1. There are two ways of finding evidence for that sort of question. First, you could do two pairwise comparisons (1 vs. 2, and 2 vs. 3). Alternatively, you could use a different analysis method that takes account of the fact that block is an ordinal factor (i.e. that it has a specific order). Regression is often a good choice in these cases.


This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.