11.5 Post Hoc Tests
When testing the relationship between your explanatory (\(X\)) and response variable (\(Y\)) in the context of ANOVA, your categorical explanatory variable (\(X\)) may have more than two levels.
For example, when we examine the differences in mean GPA (\(Y\)) across different college years (\(X\) = freshman, sophomore, junior and senior) or the differences in mean frustration level (\(Y\)) by college major (\(X\) = Business, English, Mathematics, Psychology), there is just one alternative hypothesis, which claims that there is a relationship between \(X\) and \(Y\).
When the null hypothesis is rejected, the conclusion is that not all the means are equal.
Note that there are many ways for \(\mu_1, \mu_2, \mu_3, \mu_4\) not to be all equal, and \(\mu_1 \neq \mu_2 \neq \mu_3 \neq \mu_4\) is just one of them. Another way could be \(\mu_1 = \mu_2 = \mu_3 \neq \mu_4\) or \(\mu_1 = \mu_2 \neq \mu_3 \neq \mu_4\)
In the case where the explanatory variable (\(X\)) represents more than two groups, a significant ANOVA F test does not tell us which groups are different from the others.
To determine which groups are different from the others, we would need to perform post hoc tests. These tests, done after the ANOVA, are generally termed post hoc paired comparisons.
Post hoc paired comparisons (meaning “after the fact” or “afterdata collection”) must be conducted in a particular way in order to prevent excessive Type I error.
Type I error occurs when you make an incorrect decision about the null hypothesis. Specifically, this type of error is made when your p-value makes you reject the null hypothesis (\(H_0\)) when it is true. In other words, your p-value is sufficiently small for you to say that there is a real association, despite the fact that the differences you see are due to chance alone. The type I error rate equals your p-value and is denoted by the Greek letter \(\alpha\) (alpha).
Although a Type I Error rate of 0.05 is considered acceptable (i.e. it is acceptable that 5 times out of 100 you will reject the null hypothesis when it is true), higher Type I error rates are not considered acceptable. If you were to use the significance level of 0.05 across multiple paired comparisons (for example, three independent comparisons) with \(\alpha = 0 .05\), then the \(\alpha\) rate across all three comparisons is \(1 - (1 - \alpha)^{\text{Number of comparisons}} = 1 - (1 - 0.05)^3 = 0.142625\). In other words, across the unprotected paired comparisons you will reject the null hypothesis when it is true roughly 14 times out of 100.
The purpose of running protected post hoc tests is that they allow you to conduct multiple paired comparisons without inflating the Type I Error rate.
For ANOVA, you can use one of several post hoc tests, each which control for Type I Error, while performing paired comparisons (Duncan Multiple Range test, Dunnett’s Multiple Comparison test, Newman-Keuls test, Scheffe’s test, Tukey’s HSD test, Fisher’s LSD test, Sidak).
Analysis of Variance
Analysis of variance assesses whether the means of two or more groups are statistically different from each other. This analysis is appropriate when you want to compare the means (quantitative variables) of \(k\) groups (categorical variables) under certain assumptions (constant variance for all \(k\) groups). The null hypothesis is that there is no difference in the mean of the quantitative variable across groups (categorical variable), while the alternative is that there is a difference.
TukeyHSD(aov(Frustration.Score ~ Major, data = frustration))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Frustration.Score ~ Major, data = frustration)
$Major
diff lwr upr p adj
English-Business 4.4571429 2.8449899 6.069296 0.0000000
Mathematics-Business 5.8857143 4.2735614 7.497867 0.0000000
Psychology-Business 6.7142857 5.1021328 8.326439 0.0000000
Mathematics-English 1.4285714 -0.1835815 3.040724 0.1019527
Psychology-English 2.2571429 0.6449899 3.869296 0.0021515
Psychology-Mathematics 0.8285714 -0.7835815 2.440724 0.5411978
opar <- par(no.readonly = TRUE)
par(mar = c(5.1, 11.1, 4.1, 2.1), las = 1) # Enlarge left margin
plot(TukeyHSD(aov(Frustration.Score ~ Major, data = frustration)))
par(opar) # reset margins
Of the \(\binom{4}{2}=6\) pairwise differences, Tukey’s HSD suggest that all except Mathematics - English
and Psychology - Mathematics
are significant.
Analysis of Variance Assignment
Post the syntax to your private GitHub repository used to run an ANOVA along with corresponding output and a few sentences of interpretation. You will need to analyze and interpret post hoc paired comparisons in instances where your original statistical test was significant, and you were examining more than two groups (i.e. more than two levels of a categorical, explanatory variable).
Example of how to write results for ANOVA:
MEANS <- tapply(nesarc$DailyCigsSmoked, list(nesarc$TobaccoDependence), mean, na.rm = TRUE)
MEANS
No Nicotine Dependence Nicotine Dependence
11.41393 14.62782
SD <- tapply(nesarc$DailyCigsSmoked, list(nesarc$TobaccoDependence), sd, na.rm = TRUE)
SD
No Nicotine Dependence Nicotine Dependence
7.427612 9.152854
RES <- summary(aov(DailyCigsSmoked ~ TobaccoDependence, data = nesarc))
RES
Df Sum Sq Mean Sq F value Pr(>F)
TobaccoDependence 1 3241 3241 44.68 3.42e-11 ***
Residuals 1313 95236 73
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5 observations deleted due to missingness
When examining the association between current number of cigarettes smoked (quantitative response) and past year nicotine dependence (categorical explanatory), an Analysis of Variance (ANOVA) revealed that among daily, young adult smokers (my sample), those with nicotine dependence reported smoking significantly more cigarettes per day (Mean = 14.6, s.d. \(\pm\) 9.2) compared to those without nicotine dependence (Mean = 11.4, s.d. \(\pm\) 7.4), F(1, 1313) = 44.7, p < 0.0001.
Example of how to write post hoc ANOVA results:
nesarc$DCScat <- cut(nesarc$DailyCigsSmoked, breaks = c(0, 5, 10, 15, 20, 98), include.lowest = FALSE)
mod <- aov(NumberNicotineSymptoms ~ DCScat, data = nesarc)
RES <- summary(mod)
RES
Df Sum Sq Mean Sq F value Pr(>F)
DCScat 4 22049 5512 31.95 <2e-16 ***
Residuals 1310 225997 173
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5 observations deleted due to missingness
tapply(nesarc$NumberNicotineSymptoms, nesarc$DCScat, mean)
(0,5] (5,10] (10,15] (15,20] (20,98]
13.37751 17.79874 22.90299 23.56522 26.04598
TukeyHSD(mod)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = NumberNicotineSymptoms ~ DCScat, data = nesarc)
$DCScat
diff lwr upr p adj
(5,10]-(0,5] 4.4212321 1.616191 7.226273 0.0001739
(10,15]-(0,5] 9.5254750 5.681531 13.369419 0.0000000
(15,20]-(0,5] 10.1877074 7.243633 13.131781 0.0000000
(20,98]-(0,5] 12.6684670 8.200190 17.136744 0.0000000
(10,15]-(5,10] 5.1042429 1.596411 8.612074 0.0007080
(15,20]-(5,10] 5.7664753 3.277188 8.255762 0.0000000
(20,98]-(5,10] 8.2472349 4.064595 12.429874 0.0000008
(15,20]-(10,15] 0.6622323 -2.957740 4.282205 0.9873982
(20,98]-(10,15] 3.1429919 -1.796859 8.082843 0.4109223
(20,98]-(15,20] 2.4807596 -1.796365 6.757884 0.5077204
opar <- par(no.readonly = TRUE)
par(mar = c(5.1, 8.1, 4.1, 2.1), las = 1) # Enlarge left margin
plot(TukeyHSD(mod))
par(opar)
ANOVA revealed that among daily, young adult smokers (my sample), number of cigarettes smoked per day (collapsed into 5 ordered categories, which is the categorical explanatory variable) and number of nicotine dependence symptoms (quantitative response variable) were significantly associated, F (4, 1310) = 31.95, p < 0.0001. Post hoc comparisons of mean number of nicotine dependence symptoms by pairs of cigarettes per day categories revealed that those individuals smoking more than 10 cigarettes per day (i.e. 11 to 15, 16 to 20 and >20) reported significantly more nicotine dependence symptoms compared to those smoking 10 or fewer cigarettes per day (i.e. 1 to 5 and 6 to 10 cigarettes per day).