In the Evidence worksheet, we did the following t-test:
t.test(cpsdata$income ~ cpsdata$sex)
Welch Two Sample t-test data: cpsdata$income by cpsdata$sex t = -3.6654, df = 9991.3, p-value = 0.0002482 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -14518.10 -4400.64 sample estimates: mean in group female mean in group male 82677.29 92136.66
Here’s a more detailed explanation of the output of that test – we’ll go through each bit:
Two Sample t-test - It’s two sample because you have two different groups (“samples”) of people being compared in the test – females and males.
data: cpsdata$income by cpsdata$sex - This just reminds you what data you’re analyzing, it’s basically a copy of what you told it to do, i.e.
cpsdata$income ~ cpsdata$sex
alternative hypothesis: true difference in means is not equal to 0 - This is a way of saying that, before looking at the data, you made no assumptions about whether men would earn more than women, or vice versa. This is sometimes called a two-tailed test - see below if you can safely assume a direction before looking at your data (also called a one-tailed test).
t = -3.6654 - This is the t value – the output of a t-test. It’s a bit like an effect size, except it’s harder to interpret, because its value is also affected by the sample size (larger samples mean larger t values, other things being equal). A t value isn’t at all useful on its own but along with the degrees of freedom (see below), we can use it to calculate the p value (also see below). The t-value is negative for the reason explained in the one-tailed tests section, below. Generally speaking, psychologists ignore the minus sign when reporting t values in their papers, although people differ on this.
df = 9991.3 - df is short for degrees of freedom. In a “Student” t-test, the degrees of freedom is the sample size, minus the number of means you’ve calculated from that sample. In a Welch t-test, this number is corrected to deal with some of the problems with the Student’s t-test. This correction makes the Welch t-test more accurate than the Student’s t-test.
p-value = 0.0002482 - The is the p value of the t-test. It’s the probability of your data, under the assumption there is no difference between groups (sometimes called the null hypothesis). You need the t value and the degrees of freedom to be able to calculate the p value … but R does those calculations for you.
Although you can get these in other ways, for convenience the
t.test command gives you the mean for each group:
sample estimates: mean in group female mean in group male 82677.29 92136.66
These are the mean incomes for the two groups, $82677.29 for females and $92136.66 for males. In our sample, women earn about $9459 less than men, on average.
How big is this difference likely to be in the US population as a whole — assuming our sample is representative of the US population? This is where this part of the ouput comes in:
95 percent confidence interval: -14518.10 -4400.64
This 95% confidence interval tells us that the mean difference in the population is very likely to be somewhere between $14,518.10 and $4400.64. If we had collected more data we could have been more precise.
The 95% confidence interval is the only thing reported by a t-test that is both useful and easy to interpret. Psychologists are now encouraged to report it in their papers, like this:
Women earned less than men, [-$14518, -$4400], d = .20, t(991.3) = 3.67, p < .05.
In a one-tailed test, you decide before looking at your data which direction the effect should be in. For example, you may have read a lot of scientific papers about the gender pay gap, so you’re pretty sure that if you find a difference in your sample, it’ll be the women who earn less.
R deals with groups in alphabetical order of the label you gave them, so females are group 1, and males are group 2 (because
f comes before
m in the alphabet). You expect the group 1 mean to be less than the group 2 mean. So you use this command to do this one-tailed test:
t.test(cpsdata$income ~ cpsdata$sex, alternative = "less")
The t value is negative because R calculates the mean difference as group 1 minus group 2.
If instead your hypothesis was that females earn more than males, you’d use this command instead:
t.test(cpsdata$income ~ cpsdata$sex, alternative = "greater")
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.