Problem

You want to test whether two samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean.

Solution

Sample data

We will use the built-in sleep data set.

sleep
#>    extra group ID
#> 1    0.7     1  1
#> 2   -1.6     1  2
#> 3   -0.2     1  3
#> 4   -1.2     1  4
#> 5   -0.1     1  5
#> 6    3.4     1  6
#> 7    3.7     1  7
#> 8    0.8     1  8
#> 9    0.0     1  9
#> 10   2.0     1 10
#> 11   1.9     2  1
#> 12   0.8     2  2
#> 13   1.1     2  3
#> 14   0.1     2  4
#> 15  -0.1     2  5
#> 16   4.4     2  6
#> 17   5.5     2  7
#> 18   1.6     2  8
#> 19   4.6     2  9
#> 20   3.4     2 10

We’ll also make a wide version of the sleep data; below we’ll see how to work with data in both long and wide formats.

sleep_wide <- data.frame(
    ID=1:10,
    group1=sleep$extra[1:10],
    group2=sleep$extra[11:20]
)
sleep_wide
#>    ID group1 group2
#> 1   1    0.7    1.9
#> 2   2   -1.6    0.8
#> 3   3   -0.2    1.1
#> 4   4   -1.2    0.1
#> 5   5   -0.1   -0.1
#> 6   6    3.4    4.4
#> 7   7    3.7    5.5
#> 8   8    0.8    1.6
#> 9   9    0.0    4.6
#> 10 10    2.0    3.4

Comparing two groups: independent two-sample t-test

Suppose the two groups are independently sampled; we’ll ignore the ID variable for the purposes here.

The t.test function can operate on long-format data like sleep, where one column (extra) records the measurement, and the other column (group) specifies the grouping; or it can operate on two separate vectors.

# Welch t-test
t.test(extra ~ group, sleep)
#> 
#> 	Welch Two Sample t-test
#> 
#> data:  extra by group
#> t = -1.8608, df = 17.776, p-value = 0.07939
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -3.3654832  0.2054832
#> sample estimates:
#> mean in group 1 mean in group 2 
#>            0.75            2.33

# Same for wide data (two separate vectors)
# t.test(sleep_wide$group1, sleep_wide$group2)

By default, t.test does not assume equal variances; instead of Student’s t-test, it uses the Welch t-test by default. Note that in the Welch t-test, df=17.776, because of the adjustment for unequal variances. To use Student’s t-test, set var.equal=TRUE.

# Student t-test
t.test(extra ~ group, sleep, var.equal=TRUE)
#> 
#> 	Two Sample t-test
#> 
#> data:  extra by group
#> t = -1.8608, df = 18, p-value = 0.07919
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -3.363874  0.203874
#> sample estimates:
#> mean in group 1 mean in group 2 
#>            0.75            2.33

# Same for wide data (two separate vectors)
# t.test(sleep_wide$group1, sleep_wide$group2, var.equal=TRUE)

Paired-sample t-test

You can also compare paired data, using a paired-sample t-test. You might have observations before and after a treatment, or of two matched subjects with different treatments.

Again, the t-test function can be used on a data frame with a grouping variable, or on two vectors. It relies the relative position to determine the pairing. If you are using long-format data with a grouping variable, the first row with group=1 is paired with the first row with group=2. It is important to make sure that the data is sorted and there are not missing observations; otherwise the pairing can be thrown off. In this case, we can sort by the group and ID variables to ensure that the order is the same. For more on sorting see Sorting.

# Sort by group then ID
sleep <- sleep[order(sleep$group, sleep$ID), ]

# Paired t-test
t.test(extra ~ group, sleep, paired=TRUE)
#> 
#> 	Paired t-test
#> 
#> data:  extra by group
#> t = -4.0621, df = 9, p-value = 0.002833
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -2.4598858 -0.7001142
#> sample estimates:
#> mean of the differences 
#>                   -1.58

# Same for wide data (two separate vectors)
# t.test(sleep.wide$group1, sleep.wide$group2, paired=TRUE)

The paired t-test is equivalent to testing whether difference between each pair of observations has a population mean of 0. (See below for comparing a single group to a population mean.)

t.test(sleep.wide$group1 - sleep.wide$group2, mu=0, var.equal=TRUE)
#> Error in t.test(sleep.wide$group1 - sleep.wide$group2, mu = 0, var.equal = TRUE): object 'sleep.wide' not found

Comparing a group against an expected population mean: one-sample t-test

Suppose that you want to test whether the data in column extra is drawn from a population whose true mean is 0. In this case, the group and ID columns are ignored.

t.test(sleep$extra, mu=0)
#> 
#> 	One Sample t-test
#> 
#> data:  sleep$extra
#> t = 3.413, df = 19, p-value = 0.002918
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  0.5955845 2.4844155
#> sample estimates:
#> mean of x 
#>      1.54

To visualize the groups, see ../../Graphs/Plotting distributions (ggplot2), ../../Graphs/Histogram and density plot,and ../../Graphs/Box plot.