Before you start…

Before starting this exercise, you should have completed all the Absolute Beginners’ workshop exercises. If not, take a look at those exercises before continuing. Each section below also indicates which of the earlier worksheets are relevant.

Getting the data into RStudio Server

Relevant worksheet: Intro to RStudio, Exploring data

In order to complete this worksheet, you’ll need to have downloaded your CSV file from the PsycEL exercise. See the instructions on PsycEL for how to do this.

Once you have downloaded your CSV file, open a project on RStudio Server for this analysis, create a script file, and upload your CSV to your project.

Plymouth University students: Create/open your project named psyc412; within that create a script file called memories.R. Enter all commands into that script and run them from there.

Finally, load the tidyverse package, and load your data.

# Memories from life
# Load package
library(tidyverse)
# Load data
mems <- read_csv("memories-single.csv")

Your CSV file may have a different name to the example above. If so, you will need to change memories-single.csv to the name of your file.

Making a histogram

Relevant worksheet: Exploring data.

Are memories from all time periods about equally common? Or are recent memories more common than remote ones? Or perhaps some other pattern? A histogram can help us to answer this question by visualising our data. You covered how to make a histogram in the Exploring Data worksheet. In this case, our data of interest are in the period column of the mems data frame, so the command we use is:

# Plot a histogram of 'period'
mems %>% ggplot(aes(period)) + geom_histogram(binwidth=.5) 

Explanation

  • Your histogram will look something like the above, but the heights of the bars will likely be somewhat different.

  • The binwidth has been set to .5 here to make a gap between each bar in the histogram. Try changing binwidth to 1 to see what effect it has on your plot.

Improving the histogram

Not bad…but it could be better. In particular, having the time periods labelled as numbers doesn’t make for a very readable graph; it would be better if we used more meaningful labels. We can use the scale_x_continuous command of ggplot to add our own labels to a histogram:

# Plot the histogram with labels on the x-axis
mems %>% ggplot(aes(period)) + 
  geom_histogram(binwidth=.5) + 
  scale_x_continuous(limits = c(0.75,5.25), breaks = 1:5, labels = c("Fred", "Wilma", "Barney", "Betty", "Pebbles"))

Explanation: The command scale_x_continuous contains the words breaks, labels and limits.

breaks = 1:5 tells R we want a bar for each of the periods 1, 2, 3, 4 and 5.

labels gives the label for each of those bars, in order.

limits tells R what to use as the minimum and maximum values for the x axis. It is important to include this as well as setting breaks because otherwise R will ignore zero-height bars. We set the range from 0.75 to 5.25 (rather than from 1 to 5) because we need to leave room for the width of the bars (which we set to .5 using binwidth)

Exercise

# EXERCISE

Add the above to your script and then complete the exercise below

  1. Modify the command above to add more meaningful labels to your histogram. If you get it right, it’ll look something like this (without the words “example plot”, of course):

  1. Export your histogram, using the Export icon on RStudio’s Plots window, and selecting “Save as image…”. Give it a meaningful file name (e.g. “memories-hist”) and click ‘Save’.

  2. Download your histogram from RStudio server - see these instructions for a reminder of how to do this.

  3. Upload your histogram to PsycEL (see the PsycEL activity for instructions of how to do this).


This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.