# t-test

## Problem

You want to test whether two samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean.

## Solution

### Sample data

We will use the built-in `sleep`

data set.

sleep # extra group ID # 0.7 1 1 # -1.6 1 2 # -0.2 1 3 # -1.2 1 4 # -0.1 1 5 # 3.4 1 6 # 3.7 1 7 # 0.8 1 8 # 0.0 1 9 # 2.0 1 10 # 1.9 2 1 # 0.8 2 2 # 1.1 2 3 # 0.1 2 4 # -0.1 2 5 # 4.4 2 6 # 5.5 2 7 # 1.6 2 8 # 4.6 2 9 # 3.4 2 10

Sometimes it is useful to work with wide-formatted data, so we'll make a wide version of the `sleep`

data.

sleep.wide <- data.frame(ID=1:10, group1=sleep$extra[1:10], group2=sleep$extra[11:20]) # ID group1 group2 # 1 0.7 1.9 # 2 -1.6 0.8 # 3 -0.2 1.1 # 4 -1.2 0.1 # 5 -0.1 -0.1 # 6 3.4 4.4 # 7 3.7 5.5 # 8 0.8 1.6 # 9 0.0 4.6 # 10 2.0 3.4

### Comparing two groups: independent two-sample t-test

Suppose the two groups are independently sampled; we'll ignore the ID variable for the purposes here.

The `t.test`

function can operate on long-structered data like `sleep`

, where one column (`extra`

) records the measurement, and the other column (`group`

) specifies the grouping; or it can operate on two separate vectors.

# Welch t-test # These two commands have the same effect. t.test(extra ~ group, sleep) t.test(sleep.wide$group1, sleep.wide$group2) # Welch Two Sample t-test # # data: extra by group # t = -1.8608, df = 17.776, p-value = 0.07939 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -3.3654832 0.2054832 # sample estimates: # mean in group 1 mean in group 2 # 0.75 2.33

By default, `t.test`

does ** not** assume equal variances; instead of Student's t-test, it uses the Welch t-test by default. Note that in the Welch t-test, df=17.776, because of the adjustment for unequal variances. To use Student's t-test, set

`var.equal=TRUE`

.# Student t-test # These two commands have the same effect. t.test(extra ~ group, sleep, var.equal=TRUE) t.test(sleep.wide$group1, sleep.wide$group2, var.equal=TRUE) # Two Sample t-test # # data: extra by group # t = -1.8608, df = 18, p-value = 0.07919 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -3.363874 0.203874 # sample estimates: # mean in group 1 mean in group 2 # 0.75 2.33

### Paired-sample t-test

You can also compare paired data, using a paired-sample t-test. You might have observations before and after a treatment, or of two matched subjects with different treatments.

Again, the `t-test`

function can be used on a data frame with a grouping variable, or on two vectors. It relies the relative position to determine the pairing. If you are using long-format data with a grouping variable, the first row with group=1 is paired with the first row with group=2. It is important to make sure that the data is sorted and there are not missing observations; otherwise the pairing can be thrown off.

# These two ways of doing it have the same effect. # 1. Use long-format data with grouping variable # 2. Use two vectors, in this case from wide-format data frame t.test(extra ~ group, sleep, paired=TRUE) t.test(sleep.wide$group1, sleep.wide$group2, paired=TRUE) # Paired t-test # # data: extra by group # t = -4.0621, df = 9, p-value = 0.002833 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -2.4598858 -0.7001142 # sample estimates: # mean of the differences # -1.58

The paired t-test is equivalent to testing whether **difference** between each pair of observations has a population mean of 0. (See below for comparing a single group to a population mean.)

t.test(sleep.wide$group1 - sleep.wide$group2, mu=0, var.equal=TRUE) # One Sample t-test # # data: sleep.wide$group1 - sleep.wide$group2 # t = -4.0621, df = 9, p-value = 0.002833 # alternative hypothesis: true mean is not equal to 0 # 95 percent confidence interval: # -2.4598858 -0.7001142 # sample estimates: # mean of x # -1.58

### Comparing a group against an expected population mean: one-sample t-test

Suppose that you want to test whether the data in column `extra`

is drawn from a population whose true mean is 0. In this case, the `group`

and `ID`

columns are ignored.

t.test(sleep$extra, mu=0) # # One Sample t-test # # data: sleep$extra # t = 3.413, df = 19, p-value = 0.002918 # alternative hypothesis: true mean is not equal to 0 # 95 percent confidence interval: # 0.5955845 2.4844155 # sample estimates: # mean of x # 1.54

To visualize the groups, see ../../Graphs/Plotting distributions (ggplot2), ../../Graphs/Histogram and density plot,and ../../Graphs/Box plot.