## Before you start…

Before starting this worksheet, you should have completed all three previous worksheets of the Very Brief Guide to R. Once you have, you’ll have an R project that contains a script like this:

# Exploring data (briefly)
library(tidyverse)
# Display mean income
cpsdata %>% summarise(mean(income))
# Calculate mean hours per week
cpsdata %>% summarise(mean(hours, na.rm = TRUE))

# Group differences (briefly)
# Group by sex, display mean income
cpsdata %>% group_by(sex) %>% summarise(mean(income))
# Display density plot of income, by sex
cpsdata %>% ggplot(aes(income, colour = factor(sex))) + geom_density(aes(y = ..scaled..))
# Filter people with income < $150K into 'cpslow' cpslow <- cpsdata %>% filter(income < 150000) # Display density plot of incomes below$150K, by sex
cpslow %>% ggplot(aes(income, colour = factor(sex))) + geom_density(aes(y = ..scaled..))
# EXERCISE
# Group by 'native', display mean income below $150K cpslow %>% group_by(native) %>% summarise(mean(income)) # Display density plot of incomes below$150K, by 'native'
cpslow %>% ggplot(aes(income, colour = factor(native))) + geom_density(aes(y = ..scaled..))

# Evidence (briefly), part 1
library(BayesFactor, quietly = TRUE)
# Calculate Bayesian t-test for effect of 'sex', on 'income'
ttestBF(formula = income ~ sex, data = data.frame(cpsdata))
# Calculate traditional t-test for effect of 'sex' on 'income'
t.test(cpsdata$income ~ cpsdata$sex)
# Exercise
# Calculate Bayesian t-test for effect of 'native', on incomes below $150K ttestBF(formula = income ~ native, data = data.frame(cpslow)) # Calculate traditional t-test for effect of 'native' on incomes below £150K t.test(cpslow$income ~ cpslow$native) ## Contents ## Loading new data We’re going to use some new data in this final worksheet, so download it from here and upload it to RStudio, and then load the data into a dataframe called gdata. Look at the Exploring data worksheet if you need a reminder on how to do this. If you’ve done it correctly, you’ll get an output like this: Rows: 25 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," dbl (4): grp, ingroup, outgroup, dominance ℹ Use spec() to retrieve the full column specification for this data. ℹ Specify the column types or set show_col_types = FALSE to quiet this message. Next, click on gdata in the Environment window, and take a look. The data is from an experiment where each of 25 groups of people selected a leader and then completed a task together. Afterwards, they answered some questions about their group. Specifically, they rated their ingroup closeness (how close, psychologically speaking, they felt to members of their own group), their outgroup distance (how distant they felt from members of other groups), and how dominant their group leader was in their group. The ratings were made individually, and then averaged to give one number per group per measure. Here’s what each of the column labels mean: Column Description Values grp ID number of the group a number ingroup Group’s mean rating of ingroup closeness 1 (low) - 10 (high) outgroup Group’s mean rating of outgroup distance 1 (low) - 10 (high) dominance Group’s mean rating of the dominance of their group leader 1 (low) - 10 (high) This is a small dataset comprising 25 groups. ## Scatterplots One question we can ask about these data concerns the relationship between ingroup closeness and outgroup distance. For example, does high ingroup closeness tend to be associated with high outgroup distance – perhaps feeling close to your ingroup is associated with feeling distant from your outgroup? Or perhaps high ingroup closeness is associated with low outgroup distance — feeling close to your own group also makes you feel close to other groups? Or, a third option, perhaps the two things are unrelated — whether you have high or low ingroup closeness does not predict your outgroup distance. One way to look at this question is to produce a scatterplot. On a scatterplot, each point represents one group. That point’s position on the x-axis represents their ingroup closeness, and that point’s position on the y-axis represents their outgroup distance. The command to produce a scatterplot in R is much like the command for a density plot. It is: # Display scatterplot of 'ingroup' versus 'outgroup' gdata %>% ggplot(aes(x = ingroup, y = outgroup)) + geom_point() ### Explanation of command The command takes the data from the gdata dataframe, and pipes it (%>%) to ggplot to produce a graph. The rest of the command tells ggplot what type of graph we want: geom_point() - We want a scatterplot aes(x = ingroup, y = outgroup) - We want the variable ingroup on the x-axis, and the variable outgroup on th y-axis. ### Discussion of output In the above scatterplot, many of the points are close to the x axis. This is becasue, as we saw above, most groups gave a rating close to 1 for outgroup distance. However, once we get to an ingroup closeness above 8, an interesting pattern starts to emerge. As ingroup closeness increases from 8 to 10, outgroup distance rises from around 1 to around 7 or 8. So it seems that, in this example dataset, ingroup closeness and outgroup distance are related. We call this type of relationship a correlation. ## Measuring correlation Sometimes, it’s useful to have a single number that summarises how well two variables are correlated. We can calculate this number, called a correlation co-efficient, using the cor command in R: # Display correlation co-efficient for 'ingroup' versus 'outgroup' cor(gdata$ingroup, gdata$outgroup)  0.6641777 ### Explanation of command Here’s what each part of the command means: cor() - The command to calculate a correlation co-efficient. gdata$ingroup - One variable is in the ingroup column of the gdata data frame.

, - this comma needs to be here so R knows where one variable ends and the other begins.

## Exercise

# EXERCISE

In this exercise, you’ll apply what you’ve learned to the relationship between ingroup closeness, and group-leader dominance. Do each of the following analyses, adding the appropriate comments and commands to your script:

1. Make a scatterplot with ingroup closeness on the x-axis, and group-leader dominance on the y-axis.

2. Calculate the correlation co-efficient for ingroup versus dominance.

3. Calculate the Bayes Factor for this correlation.

### Expected output

If you’ve done it right, these are the answers you’ll get: -0.8196067
Bayes factor analysis
--------------
 Alt., r=0.333 : 10578.51 ±0%

Against denominator:
Null, rho = 0
---
Bayes factor type: BFcorrelation, Jeffreys-beta*

## The End!

If you’re able to complete the above exercise on your own, you’re all set! If not, ask for help in class, and/or work through the Absolute Beginners’ Guide to R