t-test

Table of contents

Problem

You want to test whether two samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean.

Solution

Sample data

We will use the built-in sleep data set.

sleep
# extra group ID
#   0.7     1  1
#  -1.6     1  2
#  -0.2     1  3
#  -1.2     1  4
#  -0.1     1  5
#   3.4     1  6
#   3.7     1  7
#   0.8     1  8
#   0.0     1  9
#   2.0     1 10
#   1.9     2  1
#   0.8     2  2
#   1.1     2  3
#   0.1     2  4
#  -0.1     2  5
#   4.4     2  6
#   5.5     2  7
#   1.6     2  8
#   4.6     2  9
#   3.4     2 10

Sometimes it is useful to work with wide-formatted data, so we'll make a wide version of the sleep data.

sleep.wide <- data.frame(ID=1:10, group1=sleep$extra[1:10], group2=sleep$extra[11:20])
# ID group1 group2
#  1    0.7    1.9
#  2   -1.6    0.8
#  3   -0.2    1.1
#  4   -1.2    0.1
#  5   -0.1   -0.1
#  6    3.4    4.4
#  7    3.7    5.5
#  8    0.8    1.6
#  9    0.0    4.6
# 10    2.0    3.4

Comparing two groups: independent two-sample t-test

Suppose the two groups are independently sampled; we'll ignore the ID variable for the purposes here.

The t.test function can operate on long-structered data like sleep, where one column (extra) records the measurement, and the other column (group) specifies the grouping; or it can operate on two separate vectors.

# Welch t-test
# These two commands have the same effect.
t.test(extra ~ group, sleep)
t.test(sleep.wide$group1, sleep.wide$group2)
#       Welch Two Sample t-test
#
# data:  extra by group 
# t = -1.8608, df = 17.776, p-value = 0.07939
# alternative hypothesis: true difference in means is not equal to 0 
# 95 percent confidence interval:
#  -3.3654832  0.2054832 
# sample estimates:
# mean in group 1 mean in group 2 
#            0.75            2.33 

By default, t.test does not assume equal variances; instead of Student's t-test, it uses the Welch t-test by default. Note that in the Welch t-test, df=17.776, because of the adjustment for unequal variances. To use Student's t-test, set var.equal=TRUE.

# Student t-test
# These two commands have the same effect.
t.test(extra ~ group, sleep, var.equal=TRUE)
t.test(sleep.wide$group1, sleep.wide$group2, var.equal=TRUE)
#       Two Sample t-test
#
# data:  extra by group 
# t = -1.8608, df = 18, p-value = 0.07919
# alternative hypothesis: true difference in means is not equal to 0 
# 95 percent confidence interval:
#  -3.363874  0.203874 
# sample estimates:
# mean in group 1 mean in group 2 
#            0.75            2.33 

Paired-sample t-test

You can also compare paired data, using a paired-sample t-test. You might have observations before and after a treatment, or of two matched subjects with different treatments.

Again, the t-test function can be used on a data frame with a grouping variable, or on two vectors. It relies the relative position to determine the pairing. If you are using long-format data with a grouping variable, the first row with group=1 is paired with the first row with group=2. It is important to make sure that the data is sorted and there are not missing observations; otherwise the pairing can be thrown off.

# These two ways of doing it have the same effect.
# 1. Use long-format data with grouping variable
# 2. Use two vectors, in this case from wide-format data frame
t.test(extra ~ group, sleep, paired=TRUE)
t.test(sleep.wide$group1, sleep.wide$group2, paired=TRUE)
#        Paired t-test
#
# data:  extra by group 
# t = -4.0621, df = 9, p-value = 0.002833
# alternative hypothesis: true difference in means is not equal to 0 
# 95 percent confidence interval:
#  -2.4598858 -0.7001142 
# sample estimates:
# mean of the differences 
#                   -1.58 

The paired t-test is equivalent to testing whether difference between each pair of observations has a population mean of 0. (See below for comparing a single group to a population mean.)

t.test(sleep.wide$group1 - sleep.wide$group2, mu=0, var.equal=TRUE)
#        One Sample t-test
#
# data:  sleep.wide$group1 - sleep.wide$group2 
# t = -4.0621, df = 9, p-value = 0.002833
# alternative hypothesis: true mean is not equal to 0 
# 95 percent confidence interval:
#  -2.4598858 -0.7001142 
# sample estimates:
# mean of x 
#     -1.58 

Comparing a group against an expected population mean: one-sample t-test

Suppose that you want to test whether the data in column extra is drawn from a population whose true mean is 0. In this case, the group and ID columns are ignored.

t.test(sleep$extra, mu=0)
#
#       One Sample t-test
#
# data:  sleep$extra 
# t = 3.413, df = 19, p-value = 0.002918
# alternative hypothesis: true mean is not equal to 0 
# 95 percent confidence interval:
#  0.5955845 2.4844155 
# sample estimates:
# mean of x 
#      1.54 

To visualize the groups, see ../../Graphs/Plotting distributions (ggplot2), ../../Graphs/Histogram and density plot,and ../../Graphs/Box plot.