It can be helpful to present data in tables, rather than text, especially when you need to refer to the same data in different parts of a report. Although tables can be produced manually using a word processor, generating them directly from your data ensures they are up-to-date, and reduces copy-paste errors. This worksheet explains how to use
R to produce some of the types of table used to report psychological research.
To prepare for this worksheet:
rminr-data project we used previously.
If you don’t see a folder named
going-further, it means you created your project before the data required for this worksheet was added to the
rminr-data git repository. You can get the latest files by asking git to “
pull” the repository. Select the
Git tab, which is located in the row of tabs which includes the
Environment tab. Click the
Pull button with a downward pointing arrow. A window will open showing the files which have been pulled from the repository. Close the
Git pull window.
Files tab. The
going-further folder should contain the file
Create a script named
tables.R in the
rminr-data folder (the folder above
going-further). Add the code to this script as you work through each section of the worksheet.
We’ll start by producing a correlation matrix. A correlation matrix shows correlations between all combinations of a set of variables, which is often required in research reports. We’ll demonstrate an easy way to produce correlation matrices, with APA styling, in a format that can be read by Microsoft Word or LibreOffice Writer. A similar approach can be used to produce other common table types.
We’ll generate a correlation matrix using the
attitude dataset, which is included with
R. These data are the percentage of favourable attitudes given by employees, in relation to seven questions regarding their department (you can find out a bit more about these data by typing
?attitude). Here are the first few rows of the data frame:
We’ll use the
apaTables package to generate the correlation matrix.
rm(list = ls()) # clear the environment library(apaTables) apa.cor.table(attitude, filename="table1.doc", table.number = 1)
Table 1 Means, standard deviations, and correlations with confidence intervals Variable M SD 1 2 3 4 5 6 1. rating 64.63 12.17 2. complaints 66.60 13.31 .83** [.66, .91] 3. privileges 53.13 12.24 .43* .56** [.08, .68] [.25, .76] 4. learning 56.37 11.74 .62** .60** .49** [.34, .80] [.30, .79] [.16, .72] 5. raises 64.63 10.40 .59** .67** .45* .64** [.29, .78] [.41, .83] [.10, .69] [.36, .81] 6. critical 74.77 9.89 .16 .19 .15 .12 .38* [-.22, .49] [-.19, .51] [-.22, .48] [-.25, .46] [.02, .65] 7. advance 42.93 10.29 .16 .22 .34 .53** .57** .28 [-.22, .49] [-.15, .54] [-.02, .63] [.21, .75] [.27, .77] [-.09, .58] Note. M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation (Cumming, 2014). * indicates p < .05. ** indicates p < .01.
Explanation of commands:
We load the
apaTables package.The function to generate a correlation matrix is
apa.cor.table(). We pass the
attitude data frame as the first argument, and use
filename to specify that the output should be saved in the file
table.number argument sets the number in the table heading output, in this case “Table 1”. If you omit this argument, the text will be “Table XX”.
Explanation of output:
table1.doc from RStudio and open it using a word processor
The first thing to notice is that the styling (spacing, use of italics, horizontal lines, positioning of captions and footnotes etc.) complies with the APA guidelines for tables.
The table number and caption is above the table itself - you will need to edit the caption by hand to make it more meaningful, for example “Means, standard deviations, and correlations with confidence intervals, for the attitude measures of Study 1”.
Variable column contains a number and the column name for the seven attitude variables. The next two columns show the mean and standard deviation for each variable. The remaining columns use the numbers from items in the
Variable column as headings, indicating that they refer to the same variable. The cells show the correlation between the column variables and each of the variables in the rows. Cells are left empty where a variable would otherwise be correlated with itself. The 95% confidence interval for the correlation is shown in square brackets.
For example, the correlation between
complaints in this sample is .83. The confidence interval indicates that the population value is likely to be between .66 and .91.
Evidence for the correlation is calculated using traditional statistics, rather than the Bayes factors described in the Relationships, part 2 worksheet. One asterisk (
p < .05. Two asterisks (
p < .01. These calculations assumed a two-tailed test; one-tailed tests for correlations are explained in the More on relationships, part 2 worksheet. Also recall that p-values are widely misinterpreted, so it would be better to edit this part of the table by hand to reflect Bayes Factors you have already calculated. We suggest using
* for BF > 3,
** for BF > 10,
o for BF < 0.33, and
oo for BF < 0.1. Change the text at the bottom of the table accordingly.
For this exercise, we’ll load some data from a study which measured aspects of participants’ personality.
# Exercise 1 library(tidyverse) big5 <- read_csv('case-studies/jon-may/big5_total.csv')
Parsed with column specification: cols( subj = col_double(), openness = col_double(), conscientiousness = col_double(), extraversion = col_double(), agreeableness = col_double(), neuroticism = col_double() )
The first few rows show that the scale used measured the ‘big 5’ personality factors; openness to experience, conscientiousness, extraversion, agreeableness and neuroticism (OCEAN).
Create a correlation matrix for the five personality factors. Number the table as “Table 2”, and save the results in
table2.doc. Your table should look like this in Rstudio:
Table 2 Means, standard deviations, and correlations with confidence intervals Variable M SD 1 2 3 4 1. openness 23.15 6.78 2. conscientiousness 25.10 7.23 .15 [-.14, .42] 3. extraversion 21.50 7.86 .27 -.01 [-.02, .51] [-.29, .28] 4. agreeableness 33.54 4.55 .27 .20 .43** [-.01, .52] [-.09, .46] [.17, .64] 5. neuroticism 16.00 7.41 .34* .28 .13 .07 [.06, .57] [-.00, .52] [-.16, .40] [-.22, .34] Note. M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval. The confidence interval is a plausible range of population correlations that could have caused the sample correlation (Cumming, 2014). * indicates p < .05. ** indicates p < .01.
…and it should be APA formatted in the file
Copy the R code you used for this exercise into PsycEL
As with graphs, there is often an element of design involved in presenting tabular data in a format most useful for your reader. Packages like
apaTables are useful for producing APA tables where there is a standard way to present data. However, you often need a table which is customised to present your data in the most useful format. The cost of custom tables is that the content requires a little more preprocessing, and styling the table according to APA standards will require some hand-formatting in your wordprocessor.
We’ll demonstrate this process by producing a table of descriptive statistics. The data we’ll use comes from an experiment which evaluated children’s language development using the Words in Game (WinG) test. WinG consists of a set of picture cards which are used in four tests: noun comprehension, noun production, predicate comprehension, and predicate production. The Italian and English versions of the WinG cards use different pictures to depict the associated words. The experiment tested whether English-speaking children aged approximately 30 months, produce similar responses for the two sets of cards. We would like to produce a single table, containing descriptive statistics for all four tests.
We start by loading the data:
# Load data wing_preproc <- read_csv('going-further/picture-naming-preproc.csv')
The first few rows of
wing_preproc look like this:
Our test scores are currently in wide format (lots of columns, few rows), but R generally requires data to be in long format (lots of rows, few columns). This means we first have to make the data frame wider, so we can calculate summary statistics.
# wide to long task_by_subj <- wing_preproc %>% pivot_longer(cols = c(nc, np, pc, pp), names_to = 'task', values_to = 'correct') %>% select(subj, gender, cards, task, correct)
Explanation of command:
In the Within-subject differences worksheet, you learned how to use
pivot_wider() to widen long data frames. The
pivot_longer() command does the reverse – it lengthens wide data frames.
cols = c(nc, np, pc, pp) selects the columns we want to pivot. Each value in these columns is added to a row in a new column called
values_to = 'correct'). In the same row, a new column
task is set to the name of the column which the value came from (
names_to = 'task'). All of the values in the other columns are duplicated for each row. We select just the columns we want for our table of descriptive statistics.
The first few rows of
task_by_subj look like this:
Now we can calculate some summary statistics, using commands that we’ve already used in previous worksheets:
# Table of descriptive statistics descript <- task_by_subj %>% group_by(task, gender) %>% summarise(mean = mean(correct, na.rm = TRUE), sd = sd(correct, na.rm = TRUE))
`summarise()` regrouping output by 'task' (override with `.groups` argument)
Explanation of commands:
We’ve come across
group_by before, here we use it to group the data by two variables at the same time,
gender, giving us eight groups overall.
We’ve also come across
summarize before, including the use of
na.rm = TRUE to deal with missing data.
Our data now looks like this:
descript data frame contains just the numbers we want to include in our report - the means and standard deviations for each of the eight groups. However, the row labels (
np, etc.) are not particularly clear, so we replace them with something more human readable:
task_names <- c( nc = 'Noun Comprehension', np = 'Noun Production', pc = 'Predicate Comprehension', pp = 'Predicate Production' ) descript$task <- descript$task %>% recode(!!!task_names)
Explanation of commands: We’re using the
recode command that we’ve previously used in the cleaning up questionnaire data worksheet:
We start by telling R what each of the codes,
nc etc., mean. So, for example
nc = 'Noun Comprehension'. We combine the four ‘translations’ together into
c() (short for ‘concatenate’, i.e. put things together).
We then take the
task columns of the
descript data frame (
descript$task) and pipe (
%>%) it to
recode, where it uses
task_names to do the recoding. We write (
<-) that result back into
Our table now looks like this:
Our table is now clear and easy to read. We could include it in a report without much further effort, and the reader would be able to easily see what we wanted to show them. However, it is not quite in the format that psychologists are most familiar with (which is APA format). In APA format, the table would look more like this:
|Task||Female (M)||Female (SD)||Male (M)||Male (SD)|
In other words, it would be wider: more columns and fewer rows.
We can widen the table, using the
pivot_wider command we have previously used in the within-subject differences worksheet:
# Widen table descript_table <- descript %>% pivot_wider(names_from = gender, values_from = c(mean, sd))
Our table now has the same format as an APA table…
…but the columns are in a different order. APA format dictates that means should be placed next to their associated standard deviations in a table (APA format is weirdly specific). Fortunately, we can rearrange columns using the
select command that we’ve come across before:
# Re-order columns descript_table <- descript_table %>% select(task, mean_female, sd_female, mean_male, sd_male)
Finally, we can replace the column names with something a bit more human readable, using the
# Column names colnames(descript_table) <- c("Task", "Female (M)", "Female (SD)", "Male (M)", "Male (SD)")
|Task||Female (M)||Female (SD)||Male (M)||Male (SD)|
Note that it would arguably be clearer to write “mean” rather than “M”, but it’s another quirk of APA style that we write “M” to stand for mean.
There are a number of different ways to get a table in R into your wordprocessor. We’re going to use the
kableExtra package, because it’s really flexible, so it’s capable of producing almost any table you might need. We’re only going to use it in the most basic way here; for some other examples of what it can do, see the kableExtra website.
To get a version of
descript_table that you can cut-and-paste into your wordprocessor, do this:
library(kableExtra) descript_table %>% kable(digits=2) %>% kable_styling()
Explanation of commands:
digits=2part ensures that every number is reported to two decimal places.
kable_styling(). This command prints the table to the Viewer window in RStudio.
Explanation of output:
Try copying the table into your word processor now. In the Viewer pane, select all of the rows and columns in the table, then right-click and select
Copy. Open your word processor and select
Paste. (For this to work on a Mac, you will need be working with RStudio in Chrome rather than Safari.)
Starting with the data in
task_by_subj, generate a table of descriptive statistics showing task accuracy for the Italian and English cards. It should look like this:
|Task||English (M)||English (SD)||Italian (M)||Italian (SD)|
Copy the R code you used for this exercise into PsycEL.
You can avoid copy-pasting tables (and all other analyses) by writing your reports using
R Markdown instead of a word processor.
R Markdown is a language for writing documents which include
R code. The code is run, and the output is included in the document.
R Markdown can be used to produce different types of document (e.g. reports, presentations, web pages), in various formats (e.g. Microsoft Word, PDF, HTML). The
Research Methods in R worksheets are written using
R Markdown, and although we don’t teach it in these materials, there are other courses which make it easy to learn.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.