12.4 Post Hoc Tests

For post hoc tests following a Chi-Square, we use what is referred to as the Bonferroni Adjustment. Like the post hoc tests used in the context of ANOVA, this adjustment is used to counteract the problem of Type I Error that occurs when multiple comparisons are made. Following a Chi-Square test that includes an explanatory variable with 3 or more groups, we need to subset to each possible paired comparison. When interpreting these paired comparisons, rather than setting the \(\alpha\)-level (p-value) at 0.05, we divide 0.05 by the number of paired comparisons that we will be making. The result is our new \(\alpha\)-level (p-value). For example, if we have a significant Chi-Square when examining the association between number of cigarettes smoked per day (a 5 level categorical explanatory variable: 1-5 cigarettes; 6 -10 cigarettes; 11–15 cigarettes; 16-20 cigarettes; and >20) and nicotine dependence (a two level categorical response variable – yes vs. no), we will want to know which pairs of the 5 cigarette groups are different from one another with respect to rates of nicotine dependence.

In other words, we will make \(\binom{5}{2}=10\) comparisons (all possible comparisons). We will compare group 1 to 2; 1 to 3; 1 to 4; 1 to 5; 2 to 3; 2 to 4; 2 to 5; 3 to 4; 3 to 5; 4 to 5. When we evaluate the p-value for each of these post hoc chi-square tests, we will use 0.05/10 = 0.005 as our alpha. If the p-value is < 0.005 then we will reject the null hypothesis. If it is > 0.005, we will fail to reject the null hypothesis.

NT <- xtabs(~ TobaccoDependence + DCScat, data = nesarc)
NT
                        DCScat
TobaccoDependence        (0,5] (5,10] (10,15] (15,20] (20,98]
  No Nicotine Dependence   130    210      43     114      20
  Nicotine Dependence      119    267      91     254      67
chisq.test(NT, correct = FALSE)

    Pearson's Chi-squared test

data:  NT
X-squared = 45.159, df = 4, p-value = 3.685e-09
chisq.test(NT[, c(1, 2)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(1, 2)]
X-squared = 4.4003, df = 1, p-value = 0.03593
chisq.test(NT[, c(1, 3)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(1, 3)]
X-squared = 14.238, df = 1, p-value = 0.000161
chisq.test(NT[, c(1, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(1, 4)]
X-squared = 28, df = 1, p-value = 1.213e-07
chisq.test(NT[, c(1, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(1, 5)]
X-squared = 22.275, df = 1, p-value = 2.362e-06
chisq.test(NT[, c(2, 3)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(2, 3)]
X-squared = 6.1426, df = 1, p-value = 0.0132
chisq.test(NT[, c(2, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(2, 4)]
X-squared = 14.957, df = 1, p-value = 0.00011
chisq.test(NT[, c(2, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(2, 5)]
X-squared = 13.483, df = 1, p-value = 0.0002407
chisq.test(NT[, c(3, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(3, 4)]
X-squared = 0.056441, df = 1, p-value = 0.8122
chisq.test(NT[, c(3, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(3, 5)]
X-squared = 2.1439, df = 1, p-value = 0.1431
chisq.test(NT[, c(4, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  NT[, c(4, 5)]
X-squared = 2.1619, df = 1, p-value = 0.1415
# OR
library(fifer)
chisq.post.hoc(NT, control = "bonferroni", popsInRows  = FALSE)
Adjusted p-values used the bonferroni method.
            comparison  raw.p  adj.p
1     (0,5] vs. (5,10] 0.0416 0.4159
2    (0,5] vs. (10,15] 0.0002 0.0016
3    (0,5] vs. (15,20] 0.0000 0.0000
4    (0,5] vs. (20,98] 0.0000 0.0000
5   (5,10] vs. (10,15] 0.0133 0.1328
6   (5,10] vs. (15,20] 0.0001 0.0012
7   (5,10] vs. (20,98] 0.0002 0.0021
8  (10,15] vs. (15,20] 0.8282 1.0000
9  (10,15] vs. (20,98] 0.1705 1.0000
10 (15,20] vs. (20,98] 0.1522 1.0000

Chi Square Asssignment

Post syntax to your private GitHub repository used to run a Chi-Square Test along with corresponding output and a few sentences of interpretation.

Example of how to write results for Chi-Square tests:

When examining the association between lifetime major depression (categorical response) and past year nicotine dependence (categorical explanatory), a chi-square test of independence revealed that among daily, young adults smokers (my sample), those with past year nicotine dependence were more likely to have experienced major depression in their lifetime (36.17%) compared to those without past year nicotine dependence (12.67%), \(\chi^2=\) 88.6, 1 df, p < 0.0001.

T2 <- xtabs(~TobaccoDependence + MajorDepression, data = nesarc)
prop.table(T2, 1)
                        MajorDepression
TobaccoDependence        No Depression Yes Depression
  No Nicotine Dependence     0.8733205      0.1266795
  Nicotine Dependence        0.6382979      0.3617021
chisq.test(T2, correct = FALSE)

    Pearson's Chi-squared test

data:  T2
X-squared = 88.598, df = 1, p-value < 2.2e-16

Example of how to write post hoc Chi-Square results:

A Chi Square test of independence revealed that among daily, young adult smokers (my sample), number of cigarettes smoked per day (collapsed into 5 ordered categories) and past year nicotine dependence (binary categorical variable) were significantly associated, \(\chi^2\) = 45.16, 4 df, p < 0.0001. Post hoc comparisons of rates of nicotine dependence by pairs of cigarettes per day categories revealed that higher rates of nicotine dependence were seen among those smoking more cigarettes, up to 11 to 15 cigarettes per day. In comparison, prevalence of nicotine dependence was statistically similar among those groups smoking 10 to 15, 16 to 20, and > 20 cigarettes per day.

T3 <- xtabs(~TobaccoDependence + DCScat, data = nesarc)
T3
                        DCScat
TobaccoDependence        (0,5] (5,10] (10,15] (15,20] (20,98]
  No Nicotine Dependence   130    210      43     114      20
  Nicotine Dependence      119    267      91     254      67
prop.table(T3, 2)
                        DCScat
TobaccoDependence            (0,5]    (5,10]   (10,15]   (15,20]   (20,98]
  No Nicotine Dependence 0.5220884 0.4402516 0.3208955 0.3097826 0.2298851
  Nicotine Dependence    0.4779116 0.5597484 0.6791045 0.6902174 0.7701149
library(ggplot2)
ggplot(data = nesarc[(!is.na(nesarc$TobaccoDependence) & 
                        !is.na(nesarc$DCScat)), ], 
       aes(x = DCScat, fill = TobaccoDependence)) + 
  geom_bar(position = "fill") +
  theme_bw() +
  labs(x= "Daily Smoking Frequency", y = "Fraction") +
  guides(fill = guide_legend(reverse = TRUE))

chisq.test(T3, correct = FALSE)

    Pearson's Chi-squared test

data:  T3
X-squared = 45.159, df = 4, p-value = 3.685e-09
# Post hoc tests
chisq.test(T3[, c(1, 2)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(1, 2)]
X-squared = 4.4003, df = 1, p-value = 0.03593
chisq.test(T3[, c(1, 3)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(1, 3)]
X-squared = 14.238, df = 1, p-value = 0.000161
chisq.test(T3[, c(1, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(1, 4)]
X-squared = 28, df = 1, p-value = 1.213e-07
chisq.test(T3[, c(1, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(1, 5)]
X-squared = 22.275, df = 1, p-value = 2.362e-06
chisq.test(T3[, c(2, 3)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(2, 3)]
X-squared = 6.1426, df = 1, p-value = 0.0132
chisq.test(T3[, c(2, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(2, 4)]
X-squared = 14.957, df = 1, p-value = 0.00011
chisq.test(T3[, c(2, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(2, 5)]
X-squared = 13.483, df = 1, p-value = 0.0002407
chisq.test(T3[, c(3, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(3, 4)]
X-squared = 0.056441, df = 1, p-value = 0.8122
chisq.test(T3[, c(3, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(3, 5)]
X-squared = 2.1439, df = 1, p-value = 0.1431
chisq.test(T3[, c(4, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T3[, c(4, 5)]
X-squared = 2.1619, df = 1, p-value = 0.1415