Bansley et al. (1992) investigated the relationship between month of birth and achievement in sport. Birth dates were collected for players in teams competing in the 1990 World Cup soccer games.
Observed <- c(150, 138, 140, 100)
names(Observed) <- c("Aug-Oct", "Nov-Jan", "Feb-April", "May-July")
Observed
Aug-Oct Nov-Jan Feb-April May-July
150 138 140 100
We wish to test whether these data are consistent with the hypothesis that birthdays of soccer players are uniformly distributed across the four quarters of the year. Let \(P_i\) denote the probability of a birth occurring in the \(i^{\text{th}}\) quarter; the hypotheses are as follows:
\(H_{0}: p_1 = \tfrac{1}{4}, p_2 = \tfrac{1}{4}, p_3 = \tfrac{1}{4}, p_4 = \tfrac{1}{4}\)
\(H_A: p_i \neq \tfrac{1}{4} \text{for at least one } i.\)
There were a total of \(n = 528\) players considered for this study, so the expected count for each quarter is \(528/4 = 132\).
chisq.test(Observed, p = c(1/4, 1/4, 1/4, 1/4))
Chi-squared test for given probabilities
data: Observed
X-squared = 10.97, df = 3, p-value = 0.01189
Given the p-value of 0.01 evidence suggests birthdays for World Cup soccer players are not uniformly distributed.
Suppose you draw 100 numbers at random from an unknown distribution. Thirty values fall in the interval \((0, 0.25]\), 30 fall in \((0.25, 0.75]\), 22 fall in \((0.75, 1.25]\), and the rest fall in \((1.25, \infty]\). Your friend claims that the distribution is exponential with parameter \(\lambda = 1\). Do you believe her?
\[f(x) = \lambda e^{-\lambda x},\quad x \geq 0.\]
We wish to test the following:
\(H_0:\) The data are from an exponential distribution with \(\lambda = 1\).
\(H_A:\) The data are not from an exponential distribution with \(\lambda = 1\).
Given \(X \sim \text{Exp}(\lambda = 1)\). The probabilities for each interval are as follows:
\(p_1 = P(0 \leq X \leq 0.25)=\int_0^{0.25}e^{-x}\,dx =0.2211992\)
\(p_2 = P(0.25 \leq X \leq 0.75)=\int_{0.25}^{0.75}e^{-x}\,dx =0.3064342\)
\(p_3 = P(0.75 \leq X \leq 1.25)=\int_{0.75}^{1.25}e^{-x}\,dx =0.1858618\)
\(p_4 = P(1.25 \leq X \leq \infty)=\int_{1.25}^{\infty}e^{-x}\,dx =0.2865048\)
p1 <- pexp(0.25, 1)
p2 <- pexp(0.75, 1) - pexp(0.25, 1)
p3 <- pexp(1.25, 1) - pexp(0.75, 1)
p4 <- pexp(1.25, 1, lower = FALSE)
ps <- c(p1, p2, p3, p4)
ps
[1] 0.2211992 0.3064342 0.1858618 0.2865048
EXP <- ps*100
EXP
[1] 22.11992 30.64342 18.58618 28.65048
OBS <- c(30, 30, 22, 18)
test_stat <- sum((OBS - EXP)^2/EXP)
test_stat
[1] 7.406963
# Another approach
chisq.test(OBS, p = ps)
Chi-squared test for given probabilities
data: OBS
X-squared = 7.407, df = 3, p-value = 0.06
pvalue <- chisq.test(OBS, p = ps)$p.value
pvalue
[1] 0.05999777