library(knitr)
library(ggplot2)
R makes it really easy to perform statistical and data analytic tasks. More often than not, the hardest part is not the code itself, but rather figuring out what model to fit and how to interpret the output. This is certainly the case with hypothesis testing. Running a t-test is easy (just type t.test(...)
), but interpeting the output can be tricky.
The purpose of these notes is to fill in various common gaps that you may currently have in your understanding of hypothesis testing. You should think of these notes not as everything you need to know but rather, everything you need to know for the purpose of this class.
Suppose that someone tells you they’ve come up with a miraculous IQ-boosting drug. They even have the data to prove it!
Here’s the data they show you
aggregate(iq ~ groups, drug.data, function(x) round(mean(x), 1))
## groups iq
## 1 control 111.3
## 2 treatment 115.6
Interesting… it looks like the average IQ in the group that took the drug is 4 points higher than in the control (placebo) group. Let’s think about things statistically.
First, how many people were in each group. Sample size matters a lot.
table(drug.data$groups)
##
## control treatment
## 23 19
That’s not a very big sample size. Let’s run a t-test to assess whether the observed difference in average IQ is statistically significant.
ttest.iq <- t.test(iq ~ groups, data = drug.data)
ttest.iq
##
## Welch Two Sample t-test
##
## data: iq by groups
## t = -1.2601, df = 39.598, p-value = 0.215
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.246283 2.610128
## sample estimates:
## mean in group control mean in group treatment
## 111.2609 115.5789
We get a t-statistic of -1.26 and a p-value of 0.215.
(1) Do we reject the null hypothesis?
(2) What does the p-value actually mean?
In Lecture 7 I showed you a simulation to illustrate the key property of the p-value: When the null hypothesis is true, the p-value follows the Uniform[0,1] distribution. This is a very useful fact to keep in mind, but it seems rather abstract.
For the purpose of data analysis, it’s more helpful to think about the p-value as you’ve likely seen it defined in your statistics classes: The p-value is the probability of observing a value of the test statistic at least as large/extreme/surprising as the one we saw, assuming the null hypothesis is true. In the IQ example, the p-value is just the probability of observing a t-statistic at least as large as \(|t| =\) -1.26 if the drug actually had no effect on IQ.
(3) Why do we calculate a t-statistic? Why can’t I just look at the difference in means directly?
The t-statistic is a so-called “pivotal quantity”: Under the null, its distribution doesn’t depend on unknown quantities such as (population) means or standard deviations. The distribution of the difference in sample averages \(\hat\Delta = \bar{IQ}_{treat} - \bar{IQ}_{control}\) does depend on unknown parameters, even under the null. The distribution of \(\hat\Delta\) depends on the (population) standard deviation of treatment and control outcomes. The distribution of the t-statistic depends only on the sample size, which is observable.
That said, the t-statistic is just a normalized version of \(\hat\Delta\), so in interpreting the p-value we can say:
- The p-value is the probability that we would observe a difference in average IQ between the treatment and control group at least as large we did if the drug actually had no effect.
(4) Can we say that the probability the drug had no effect is 0.215.