12.3 The Idea of the Chi-Square Test

The idea behind the chi-square test, much like previous tests that we’ve introduced, is to measure how far the data are from what is claimed in the null hypothesis. The further the data are from the null hypothesis, the more evidence the data presents against it. We’ll use our data to develop this idea. Our data are represented by the observed counts:

TA
        DroveDrunk
Gender    No Yes
  Female 122  16
  Male   404  77

How will we represent the null hypothesis?

In the previous tests we introduced, the null hypothesis was represented by the null value. Here there is not really a null value, but rather a claim that the two categorical variables (drunk driving and gender, in this case) are independent.

To represent the null hypothesis, we will calculate another set of counts — the counts that we would expect to see (instead of the observed ones) if drunk driving and gender were really independent (i.e., if \(H_0\) were true). For example, we actually observed 77 males who drove drunk; if drunk driving and gender were indeed independent (if \(H_0\) were true), how many male drunk drivers would we expect to see instead of 77? Similarly, we can ask the same kind of question about (and calculate) the other three cells in our table.

In other words, we will have two sets of counts:

  • the observed counts (the data)

  • the expected counts (if \(H_0\) were true)

We will measure how far the observed counts are from the expected ones. Ultimately, we will base our decision on the size of the discrepancy between what we observed and what we would expect to observe if \(H_0\) were true.

How are the expected counts calculated? Once again, we are in need of probability results. Recall from the probability section that if events \(A\) and \(B\) are independent, then \(P(A \text{ and } B) = P(A) \times P(B)\). We use this rule for calculating expected counts, one cell at a time.

Here again are the observed counts:

TA
        DroveDrunk
Gender    No Yes
  Female 122  16
  Male   404  77

If driving drunk and gender were independent then:

\[P(\text{drunk and male}) = P(\text{drunk}) \times P(\text{male})\]

By dividing the counts in our table, we see that:

\(P(\text{Drunk}) = 93/619\) and

\(P(\text{Male}) = 481/619\),

and so,

\(P(\text{Drunk and Male}) = (93 / 619) (481 / 619)\)

Therefore, since there are total of 619 drivers, if drunk driving and gender were independent, the count of drunk male drivers that I would expect to see is:

\(619\times P(\text{Drunk and Male})=619(93/619)(481/619)=93\times 481/619 = 72.266559\)

Notice that this expression is the product of the column and row totals for that particular cell, divided by the overall table total:

chisq.test(TA)$expected
        DroveDrunk
Gender         No      Yes
  Female 117.2666 20.73344
  Male   408.7334 72.26656

This will always be the case, and will help streamline our calculations:

\[\text{Expected Count} = \frac{\text{Column Total} \times \text{Row Total} }{\text{Table Total}}\]

Step 3: Finding the p-value

The p-value for the chi-square test for independence is the probability of getting counts like those observed, assuming that the two variables are not related (which is what is claimed by the null hypothesis). The smaller the p-value, the more surprising it would be to get counts like we did, if the null hypothesis were true.

Technically, the p-value is the probability of observing \(\chi^2\) at least as large as the one observed. Using statistical software, we find that the p-value for this test is 0.2007975.

chisq.test(TA, correct = FALSE)

    Pearson's Chi-squared test

data:  TA
X-squared = 1.6366, df = 1, p-value = 0.2008

Step 4: Stating the conclusion in context

As usual, we use the magnitude of the p-value to draw our conclusions. A small p-value indicates that the evidence provided by the data is strong enough to reject Ho and conclude (beyond a reasonable doubt) that the two variables are related. In particular, if a significance level of .05 is used, we will reject Ho if the p-value is less than .05.

Example

A p-value of 0.2007975 is not small at all. There is no compelling statistical evidence to reject Ho, and so we will continue to assume it may be true. Gender and drunk driving may be independent, and so the data suggest that a law that forbids sale of 3.2% beer to males and permits it to females is unwarranted. In fact, the Supreme Court, by a 7-2 majority, struck down the Oklahoma law as discriminatory and unjustified. In the majority opinion Justice Brennan wrote (http://www.law.umkc.edu/faculty/projects/ftrials/conlaw/craig.html):

“Clearly, the protection of public health and safety represents an important function of state and local governments. However, appellees’ statistics in our view cannot support the conclusion that the gender-based distinction closely serves to achieve that objective and therefore the distinction cannot under [prior case law] withstand equal protection challenge.”