## Problem

You want to test whether two samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean.

## Solution

### Sample data

We will use the built-in `sleep` data set.

``````sleep
#>    extra group ID
#> 1    0.7     1  1
#> 2   -1.6     1  2
#> 3   -0.2     1  3
#> 4   -1.2     1  4
#> 5   -0.1     1  5
#> 6    3.4     1  6
#> 7    3.7     1  7
#> 8    0.8     1  8
#> 9    0.0     1  9
#> 10   2.0     1 10
#> 11   1.9     2  1
#> 12   0.8     2  2
#> 13   1.1     2  3
#> 14   0.1     2  4
#> 15  -0.1     2  5
#> 16   4.4     2  6
#> 17   5.5     2  7
#> 18   1.6     2  8
#> 19   4.6     2  9
#> 20   3.4     2 10
``````

We’ll also make a wide version of the `sleep` data; below we’ll see how to work with data in both long and wide formats.

``````sleep_wide <- data.frame(
ID=1:10,
group1=sleep\$extra[1:10],
group2=sleep\$extra[11:20]
)
sleep_wide
#>    ID group1 group2
#> 1   1    0.7    1.9
#> 2   2   -1.6    0.8
#> 3   3   -0.2    1.1
#> 4   4   -1.2    0.1
#> 5   5   -0.1   -0.1
#> 6   6    3.4    4.4
#> 7   7    3.7    5.5
#> 8   8    0.8    1.6
#> 9   9    0.0    4.6
#> 10 10    2.0    3.4
``````

### Comparing two groups: independent two-sample t-test

Suppose the two groups are independently sampled; we’ll ignore the ID variable for the purposes here.

The `t.test` function can operate on long-format data like `sleep`, where one column (`extra`) records the measurement, and the other column (`group`) specifies the grouping; or it can operate on two separate vectors.

``````# Welch t-test
t.test(extra ~ group, sleep)
#>
#> 	Welch Two Sample t-test
#>
#> data:  extra by group
#> t = -1.8608, df = 17.776, p-value = 0.07939
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -3.3654832  0.2054832
#> sample estimates:
#> mean in group 1 mean in group 2
#>            0.75            2.33

# Same for wide data (two separate vectors)
# t.test(sleep_wide\$group1, sleep_wide\$group2)
``````

By default, `t.test` does not assume equal variances; instead of Student’s t-test, it uses the Welch t-test by default. Note that in the Welch t-test, df=17.776, because of the adjustment for unequal variances. To use Student’s t-test, set `var.equal=TRUE`.

``````# Student t-test
t.test(extra ~ group, sleep, var.equal=TRUE)
#>
#> 	Two Sample t-test
#>
#> data:  extra by group
#> t = -1.8608, df = 18, p-value = 0.07919
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -3.363874  0.203874
#> sample estimates:
#> mean in group 1 mean in group 2
#>            0.75            2.33

# Same for wide data (two separate vectors)
# t.test(sleep_wide\$group1, sleep_wide\$group2, var.equal=TRUE)
``````

### Paired-sample t-test

You can also compare paired data, using a paired-sample t-test. You might have observations before and after a treatment, or of two matched subjects with different treatments.

Again, the `t-test` function can be used on a data frame with a grouping variable, or on two vectors. It relies the relative position to determine the pairing. If you are using long-format data with a grouping variable, the first row with group=1 is paired with the first row with group=2. It is important to make sure that the data is sorted and there are not missing observations; otherwise the pairing can be thrown off. In this case, we can sort by the `group` and `ID` variables to ensure that the order is the same. For more on sorting see Sorting.

``````# Sort by group then ID
sleep <- sleep[order(sleep\$group, sleep\$ID), ]

# Paired t-test
t.test(extra ~ group, sleep, paired=TRUE)
#>
#> 	Paired t-test
#>
#> data:  extra by group
#> t = -4.0621, df = 9, p-value = 0.002833
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -2.4598858 -0.7001142
#> sample estimates:
#> mean of the differences
#>                   -1.58

# Same for wide data (two separate vectors)
# t.test(sleep.wide\$group1, sleep.wide\$group2, paired=TRUE)
``````

The paired t-test is equivalent to testing whether difference between each pair of observations has a population mean of 0. (See below for comparing a single group to a population mean.)

``````t.test(sleep.wide\$group1 - sleep.wide\$group2, mu=0, var.equal=TRUE)
#> Error in t.test(sleep.wide\$group1 - sleep.wide\$group2, mu = 0, var.equal = TRUE): object 'sleep.wide' not found
``````

### Comparing a group against an expected population mean: one-sample t-test

Suppose that you want to test whether the data in column `extra` is drawn from a population whose true mean is 0. In this case, the `group` and `ID` columns are ignored.

``````t.test(sleep\$extra, mu=0)
#>
#> 	One Sample t-test
#>
#> data:  sleep\$extra
#> t = 3.413, df = 19, p-value = 0.002918
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  0.5955845 2.4844155
#> sample estimates:
#> mean of x
#>      1.54
``````

To visualize the groups, see ../../Graphs/Plotting distributions (ggplot2), ../../Graphs/Histogram and density plot,and ../../Graphs/Box plot.