General Social Survey

To obtain information from the General Social Survey, go to sda.berkley.edu. Click the ARCHIVE button. In what follows, we will use information obtained from the General Social Survey (GSS) Cumulative Datafile 1972-2010. The following data file is obtained by clicking Download > Customized Subset. Click in the radio button to the left of CSV file (Comma Separated Values with header record). Enter age(18-65) in the Selection Filter(s) box. Type DEATHPEN SEX in the Enter names of individual variables (original or created) to include box. Scroll to the bottom of the page, and click the Continue button. Review the page that appears to make sure you have selected the desired variables; then, click the Create the Files button. You should have two hot links: the Data file and the Codebook. By clicking on the Data file link, the information should open in your browser. To read the data into R, copy the URL and store it as follows. Note the URL site is temporary. Consequently, we download the data to a folder on the hard drive. The initial code is commented out as the URL will no longer work after a day or so. That is, you will need to get a new URL if you want to duplicate the entire process.

# site <- "http://sda.berkeley.edu/TMPDIR/AAELN0Q9.csv"
# download.file(url = site, destfile = "../data/DPS.csv")
DPS <- read.csv(file="../data/DPS.csv")
xtabs(~SEX + DEATHPEN, data = DPS)

   DEATHPEN
SEX     0     1     2     3     4     5     8     9
  1 20263   257   103    28    37    35    11     0
  2 24601   283   160    58    67    37    32     2

Based on the Codebook, values 0, 8, and 9 are missing data values for the variable DEATHPEN. We would like to store those values in our data frame as NA’s (how R stores missing data). Further, it would be nice to label the various categories according to their labels versus their numerical values. To map character values to the numerical entries, we will use the plyr package (it can also be done using straight R commands).

DPS$DEATHPEN <- plyr::mapvalues(DPS$DEATHPEN, from = c(0, 1, 2, 3, 4, 5, 8, 9),
                  to =c(NA, "Strongly Agree", "Agree", "Neither Agree nor Disagree",
                        "Disagree", "Strongly Disagree", NA, NA))
DPS$SEX <- plyr::mapvalues(DPS$SEX, from = c(1, 2), to = c("Male", "Female"))
xtabs(~SEX + DEATHPEN, data = DPS)

        DEATHPEN
SEX      Agree Disagree Neither Agree nor Disagree Strongly Agree
  Female   160       67                         58            283
  Male     103       37                         28            257
        DEATHPEN
SEX      Strongly Disagree
  Female                37
  Male                  35

The problem with the latest table is that the labels for the values appear in alphabetical order. To solve this problem, we convert the character variables, SEX and DEATHPEN to factors and assign the order of the levels using the levels= command.

DPS$DEATHPEN <- factor(DPS$DEATHPEN, levels = c("Strongly Agree", "Agree", "Neither Agree nor Disagree", "Disagree", "Strongly Disagree"))
DPS$SEX <- factor(DPS$SEX, levels = c("Male", "Female"))
T1 <- xtabs(~SEX + DEATHPEN, data = DPS)
T1

        DEATHPEN
SEX      Strongly Agree Agree Neither Agree nor Disagree Disagree
  Male              257   103                         28       37
  Female            283   160                         58       67
        DEATHPEN
SEX      Strongly Disagree
  Male                  35
  Female                37

addmargins(T1)

        DEATHPEN
SEX      Strongly Agree Agree Neither Agree nor Disagree Disagree
  Male              257   103                         28       37
  Female            283   160                         58       67
  Sum               540   263                         86      104
        DEATHPEN
SEX      Strongly Disagree  Sum
  Male                  35  460
  Female                37  605
  Sum                   72 1065

prop.table(T1, 1)

        DEATHPEN
SEX      Strongly Agree      Agree Neither Agree nor Disagree   Disagree
  Male       0.55869565 0.22391304                 0.06086957 0.08043478
  Female     0.46776860 0.26446281                 0.09586777 0.11074380
        DEATHPEN
SEX      Strongly Disagree
  Male          0.07608696
  Female        0.06115702

Test for Independence

If SEX and DEATHPEN are independent, the expected count \(E_{ij}\) for any cell is the row total times the column proportion, which is equivalent to the column total times the row proportion: \[E_{ij} = R_i(C_j/n) = C_j(R_i/n) = R_iC_j/n.\]

If SEX and DEATHPEN are independent, then the observed and the expected values for all cells should be similar. The degree of similarity is measured with the statistic \[\sum_{cells}\dfrac{(OBS - EXP)^2}{EXP}\]

Null and Alternative Hypotheses

The hypotheses to be tested are written as follows:

\(H_0:\) Death penalty opinion is independent of gender (there is no association between death penalty opinion and gender).

\(H_A:\) Death penalty opinion is dependent on gender (there is an association between death penalty opinion and gender).

`R` interlude

Consider how the outer() function works, which we will use to compute expected values.

outer(1:3, 1:3)

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    4    6
[3,]    3    6    9

outer(1:3, 3:1)

     [,1] [,2] [,3]
[1,]    3    2    1
[2,]    6    4    2
[3,]    9    6    3

outer(1:3, 3:1, "+")

     [,1] [,2] [,3]
[1,]    4    3    2
[2,]    5    4    3
[3,]    6    5    4

outer(1:3, 3:1, "^")

     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    8    4    2
[3,]   27    9    3

The expected counts for T1 are

EC <- function(TAB){
  outer(apply(TAB,1 , sum), apply(TAB, 2, sum))/sum(TAB)
}
EC(T1)

       Strongly Agree    Agree Neither Agree nor Disagree Disagree
Male         233.2394 113.5962                   37.14554 44.92019
Female       306.7606 149.4038                   48.85446 59.07981
       Strongly Disagree
Male            31.09859
Female          40.90141

To remove all of the NA values from DPS, use na.omit() as follows.

str(DPS)

'data.frame':   45974 obs. of  3 variables:
 $ CASEID  : int  1 3 4 5 6 7 8 9 10 11 ...
 $ DEATHPEN: Factor w/ 5 levels "Strongly Agree",..: NA NA NA NA NA NA NA NA NA NA ...
 $ SEX     : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 1 1 2 2 2 ...

DPSC <- na.omit(DPS)
str(DPSC)

'data.frame':   1065 obs. of  3 variables:
 $ CASEID  : int  26266 26267 26268 26270 26271 26272 26274 26275 26278 26279 ...
 $ DEATHPEN: Factor w/ 5 levels "Strongly Agree",..: 1 1 2 1 2 3 2 1 1 2 ...
 $ SEX     : Factor w/ 2 levels "Male","Female": 2 2 1 2 1 1 2 2 1 1 ...
 - attr(*, "na.action")=Class 'omit'  Named int [1:44909] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..- attr(*, "names")= chr [1:44909] "1" "2" "3" "4" ...

Note that DPSC has 1065 observations where DPS has 45974 observations, but 44909 of the observations in DPS are NA. To perform a permutation test, we will need to use the cleaned up version of the data frame DPSC.

Permutation Test for Independence

To perform a permutation test for independence between two categorical variables, the data will need to be stored in two columns. For example, if a contingency tables has 1065 entries, that data will need to be stored in an object with 1065 rows and a column for each categorical variable. If the null hypothesis that SEX and DEATHPEN are independent is correct, then we could permute either the SEX or DEATHPEN values, and any other permutation would be equally likely. For each permutation resample, we will obtain a contingency table and compute the statistic

\[\sum_{cells}\frac{(OBS - EXP)^2}{EXP}\]

for that permutation. Note that for every resample, the row and column totals in the contingency table are the same; only the counts in the table change.

set.seed(123)
addmargins(table(DPSC$SEX, sample(DPSC$DEATHPEN)))

        
         Strongly Agree Agree Neither Agree nor Disagree Disagree
  Male              256   107                         27       48
  Female            284   156                         59       56
  Sum               540   263                         86      104
        
         Strongly Disagree  Sum
  Male                  22  460
  Female                50  605
  Sum                   72 1065

addmargins(table(DPSC$SEX, sample(DPSC$DEATHPEN)))

        
         Strongly Agree Agree Neither Agree nor Disagree Disagree
  Male              230   115                         38       43
  Female            310   148                         48       61
  Sum               540   263                         86      104
        
         Strongly Disagree  Sum
  Male                  34  460
  Female                38  605
  Sum                   72 1065

addmargins(xtabs(~sample(SEX) + DEATHPEN, data = DPSC))

           DEATHPEN
sample(SEX) Strongly Agree Agree Neither Agree nor Disagree Disagree
     Male              232   117                         35       47
     Female            308   146                         51       57
     Sum               540   263                         86      104
           DEATHPEN
sample(SEX) Strongly Disagree  Sum
     Male                  29  460
     Female                43  605
     Sum                   72 1065

addmargins(xtabs(~sample(SEX) + DEATHPEN, data = DPSC))

           DEATHPEN
sample(SEX) Strongly Agree Agree Neither Agree nor Disagree Disagree
     Male              218   126                         38       46
     Female            322   137                         48       58
     Sum               540   263                         86      104
           DEATHPEN
sample(SEX) Strongly Disagree  Sum
     Male                  32  460
     Female                40  605
     Sum                   72 1065

To perform the permutation test, randomly permute one of the categorical variables. Each time the categorical variable is permuted, compute the statistic \(\sum_{cells}\frac{(OBS - EXP)^2}{EXP}\) on the resulting contingency table and store the result. Repeat this process a large number of times. The \(p\)-value will be the fraction of times the simulated statistics equal or exceed the value of the observed statistic.

N <- 10^4 - 1  # Change this for slower computers
result <- numeric(N)
set.seed(3)
for(i in 1:N){
# T2 <- table(sample(DPSC$SEX), DPSC$DEATHPEN)
  T2 <- xtabs(~sample(SEX) + DEATHPEN, data =DPSC)
result[i] <- chisq.test(T2)$statistic  
}
obs <- chisq.test(xtabs(~SEX + DEATHPEN, data = DPSC))$statistic
pvalue <- (sum(result >= obs) + 1)/(N + 1)
pvalue

[1] 0.0096

pvalueCH <- chisq.test(xtabs(~SEX + DEATHPEN, data = DPSC))$p.value
pvalueCH

[1] 0.009966395

# Or chisq.test(DPSC$SEX, DPSC$DEATHPEN)$p.value

Using base graphics

hist(result, breaks = "Scott", col = "pink", freq=FALSE, main = "")
curve(dchisq(x, 4), 0, 20, add=TRUE, col = "red", lwd = 4)

Figure 1: Simulated permutation distribution with base graphics

Using ggplot2 now

library(ggplot2)
DF <- data.frame(x = result)
p <- ggplot(data = DF, aes(x= x)) + geom_density(fill = "pink") + theme_bw()
p + stat_function(fun = dchisq, args = list(df = 4), color = "red", lwd = 1)

Figure 2: Simulated permutation distribution with ggplot2

The simulated permutation \(p\)-value is 0.0096. The \(p\)-value that is returned from the chisq.test() is 0.0099664. In this case, the two \(p\)-values are fairly similar. This will not always be the case.

Formatting the Data

Some times, you will have only access to data that has been summarized (contingency tables). To get your data into the needed format, you may want to use the following function written by Marc Schwartz which will take a contingency table and convert it to a flat file. The function expand.dft() is also in the package vcdExtra.

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") {
    # Take each row in the source data frame table and replicate it
    # using the Freq value
    DF <- sapply(1:nrow(x), 
                 function(i) x[rep(i, each = x$Freq[i]), ],
                 simplify = FALSE)

    # Take the above list and rbind it to create a single DF
    # Also subset the result to eliminate the Freq column
    DF <- subset(do.call("rbind", DF), select = -Freq)

    # Now apply type.convert to the character coerced factor columns  
    # to facilitate data type selection for each column 
    for (i in 1:ncol(DF)) {
        DF[[i]] <- type.convert(as.character(DF[[i]]),
                                na.strings = na.strings,
                                as.is = as.is, dec = dec)
    }

    DF
}

Example Conversion

The x= argument to expand.dft() is a table that has been converted to a data frame. In the following example, a matrix of values is created that resembles a contingency table. To convert the matrix to a table, we use the function as.table().To convert the table to a data frame, we use the function as.data.frame(). Finally, we apply the expand.dft() function to the contingency table that was converted to a data frame and stored in the object HADF.

HA <- c(110, 277, 50, 163, 302, 63)
HAM <- matrix(data = HA, nrow = 2, byrow = TRUE)
dimnames(HAM) <- list(Gender = c("Male", "Female"), Giddy = c("Very Happy", "Pretty Happy", "Not to Happy"))
HAM

        Giddy
Gender   Very Happy Pretty Happy Not to Happy
  Male          110          277           50
  Female        163          302           63

HAT <- as.table(HAM)
HAT

        Giddy
Gender   Very Happy Pretty Happy Not to Happy
  Male          110          277           50
  Female        163          302           63

addmargins(HAT)

        Giddy
Gender   Very Happy Pretty Happy Not to Happy Sum
  Male          110          277           50 437
  Female        163          302           63 528
  Sum           273          579          113 965

HADF <- as.data.frame(HAT)
HATflatfile <- vcdExtra::expand.dft(HADF)
head(HATflatfile)

  Gender      Giddy
1   Male Very Happy
2   Male Very Happy
3   Male Very Happy
4   Male Very Happy
5   Male Very Happy
6   Male Very Happy

str(HATflatfile)

'data.frame':   965 obs. of  2 variables:
 $ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
 $ Giddy : Factor w/ 3 levels "Not to Happy",..: 3 3 3 3 3 3 3 3 3 3 ...

Questions

From the http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10 page, click Download > Customized Subset. Click in the radio button to the left of CSV file (Comma Separated Values with header record). Type DEATHPEN DEGREE in the Enter names of individual variables (original or created) to include box. Click in the radio button All for CASE IDENTIFICATION AND YEAR. Scroll to the bottom of the page, and click the Continue button. Review the page that appears to make sure you have selected the desired variables; then, click the Create the Files button. You should have two hot links: the Data file and the Codebook. By clicking on the Data file link, the information should open in your browser. To read the data into R, copy the URL and store it as follows.

# site <- "http://sda.berkeley.edu/TMPDIR/AAO0mzEh.csv"
# download.file(url = site, destfile = "../data/dpy.csv")
dpy <- read.csv(file = "../data/dpy.csv")
str(dpy)

'data.frame':   55087 obs. of  5 variables:
 $ CASEID  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ DEATHPEN: int  0 0 0 0 0 0 0 0 0 0 ...
 $ DEGREE  : int  3 0 1 3 1 1 1 3 1 1 ...
 $ YEAR    : int  1972 1972 1972 1972 1972 1972 1972 1972 1972 1972 ...
 $ ID      : int  1 2 3 4 5 6 7 8 9 10 ...

xtabs(~DEATHPEN + DEGREE, data = dpy)

        DEGREE
DEATHPEN     0     1     2     3     4     8     9
       0 12106 27564  2854  7450  3595    30   129
       1   120   381    30    79    29     0     3
       2    78   168    16    52    12     0     1
       3    20    63     4    14     3     0     0
       4    23    74     9    22     9     0     1
       5    12    36     4    20    12     0     0
       8    17    23     2    11     5     0     1
       9     3     2     0     0     0     0     0

What is the only YEAR these two questions were asked?

There are several ways to answer this question, here is the first one that comes to mind.

YQ <- subset(x = dpy, subset = DEATHPEN != 0 & DEGREE != 0)
head(YQ)

      CASEID DEATHPEN DEGREE YEAR ID
26266  26266        1      1 1991  1
26267  26267        1      4 1991  2
26268  26268        2      4 1991  3
26269  26269        8      3 1991  4
26270  26270        1      1 1991  5
26273  26273        1      3 1991  8

xtabs(~YEAR, data = YQ)

YEAR
1991 
1086

To satisfy the curious, the answer is 1991.

Some code to answer Problem 10

women <- c(35, 146)
men <- c(8, 97)
stuff <- rbind(women, men)
dimnames(stuff) <- list(Gender =c("Women", "Men"), Diet=c("Yes", "No"))
stuff

       Diet
Gender  Yes  No
  Women  35 146
  Men     8  97

stuffT <- as.table(stuff)
stuffDF <- as.data.frame(stuffT)
head(stuffDF)

  Gender Diet Freq
1  Women  Yes   35
2    Men  Yes    8
3  Women   No  146
4    Men   No   97

DFL <- vcdExtra::expand.dft(stuffDF)
head(DFL)

  Gender Diet
1  Women  Yes
2  Women  Yes
3  Women  Yes
4  Women  Yes
5  Women  Yes
6  Women  Yes

set.seed(2)
N <- 10^4 - 1  # Change this for slower computers
result <- numeric(N)
for (i in 1:N) {
    T2 <- xtabs(~sample(Gender) + Diet, data = DFL)
    result[i] <- chisq.test(T2)$statistic
}
obs <- chisq.test(xtabs(~ Gender + Diet, data = DFL))$statistic
pvalue <- (sum(result >= obs) + 1)/(N + 1)
pvalue

[1] 0.0092

Test of Homogeneity

The major difference between testing for homogeneity and testing for independence is that you will have samples from two or more populations when testing for homogeneity. Recall that we only had one sample when we tested for independence. Consider testing whether the proportion of boys is the same as the proportions of girls that favor three different flavors of candy. That is, \(H_0: \pi_{B1} = \pi_{G1}, \pi_{B2} = \pi_{G2}, \pi_{B3} =\pi_{G3}\) versus \(H_A:\) at least one of the equalities does not hold.

candy <- c(42, 20, 38, 33, 27, 50)
candyM <- matrix(data = candy, nrow = 2, byrow = TRUE)
dimnames(candyM) <- list(Gender = c("Boys", "Girls"), Flavor = c("Flavor 1", "Flavor 2", "Flavor 3"))
candyM

       Flavor
Gender  Flavor 1 Flavor 2 Flavor 3
  Boys        42       20       38
  Girls       33       27       50

candyT <- as.table(candyM)
candyT

       Flavor
Gender  Flavor 1 Flavor 2 Flavor 3
  Boys        42       20       38
  Girls       33       27       50

addmargins(candyT)

       Flavor
Gender  Flavor 1 Flavor 2 Flavor 3 Sum
  Boys        42       20       38 100
  Girls       33       27       50 110
  Sum         75       47       88 210

candyDF <- as.data.frame(candyT)
candyflatfile <- vcdExtra::expand.dft(candyDF)
head(candyflatfile)

  Gender   Flavor
1   Boys Flavor 1
2   Boys Flavor 1
3   Boys Flavor 1
4   Boys Flavor 1
5   Boys Flavor 1
6   Boys Flavor 1

str(candyflatfile)

'data.frame':   210 obs. of  2 variables:
 $ Gender: Factor w/ 2 levels "Boys","Girls": 1 1 1 1 1 1 1 1 1 1 ...
 $ Flavor: Factor w/ 3 levels "Flavor 1","Flavor 2",..: 1 1 1 1 1 1 1 1 1 1 ...

E <- chisq.test(candyT)$expected
E

       Flavor
Gender  Flavor 1 Flavor 2 Flavor 3
  Boys  35.71429 22.38095 41.90476
  Girls 39.28571 24.61905 46.09524

obsstat <- chisq.test(candyT)$statistic
obsstat

X-squared 
 3.290187

# Now we will run a permutation test.
N <- 10^4 - 1  # Change this for slower computers
result <- numeric(N)
set.seed(1)
for (i in 1:N) {
    T2 <- xtabs(~sample(Gender) + Flavor, data = candyflatfile)
    result[i] <- chisq.test(T2)$statistic
}
pvalue <- (sum(result >= obsstat) + 1)/(N + 1)
pvalue

[1] 0.1941

In this case, there is no evidence to suggest the proportions of boys favoring flavors one, two, and three is any different than the proportions of girls favoring flavors one, two and three.

Goodness of Fit

A quality engineer has taken 50 samples of size 13 each from a production process. The numbers of defectives for these samples are given in the code. Test the null hypothesis at an \(\alpha =0.05\) level that the number of defective follows (a) the Poisson distribution, (b) the binomial distribution.

numberDefectives <- c(0, 1, 2, 3, 4, 5, "6 or more")
numberSamples <- c(10, 24, 10, 4, 1, 1, 0)
names(numberSamples) <- numberDefectives
numberSamples

        0         1         2         3         4         5 6 or more 
       10        24        10         4         1         1         0

Since no parameters are specified, they must be estimated from the data in order to carry out the test in both (a) and (b).

See Wikipedia for a summary of various probability distributions. The Poisson pdf is \(P(X =x|\lambda) = \frac{\lambda e^{-\lambda}}{x!}\) with \(E[X] =\lambda\). An estimate of the mean number of defectives in the 50 samples is 1.3.

muest <- sum(0:5*numberSamples[-7]/sum(numberSamples[-7]))
muest

[1] 1.3

weighted.mean(x = 0:5, w = numberSamples[-7]/sum(numberSamples[-7]))

[1] 1.3

Using \(\hat{\lambda} = 1.3\), we compute the probabilities for each category and subsequent expected values.

ps <- dpois(0:4, muest) 
p5m <- 1 - ppois(4, muest)
psf <- c(ps, p5m)
psf

[1] 0.27253179 0.35429133 0.23028937 0.09979206 0.03243242 0.01066303

exh <- psf*50
exh

[1] 13.6265897 17.7145665 11.5144683  4.9896029  1.6216209  0.5331517

((numberSamples[-7] - exh)^2/exh)

        0         1         2         3         4         5 
0.9651830 2.2301801 0.1991941 0.1962709 0.2382879 0.4087905

stat <- sum((numberSamples[-7] - exh)^2/exh)
stat

[1] 4.237906

epvalue <- pchisq(stat, 6-1-1, lower = FALSE)
epvalue

[1] 0.3747651

Since the \(p\)-value is 0.3747651, we can not reject the null hypothesis that the distribution follows a Poisson distribution. That is, there is no evidence to suggest the distribution is something other than the Poisson distribution.

One really needs the expected cell counts to be at least five. Consider collapsing the number of defectives greater than or equal to 3 to a single category.

ps <- dpois(0:2, muest) 
p3m <- 1 - ppois(2, muest)
psf <- c(ps, p3m)
psf

[1] 0.2725318 0.3542913 0.2302894 0.1428875

exh <- psf*50
exh

[1] 13.626590 17.714567 11.514468  7.144376

((numberSamples[-c(7, 6, 5)] + c(0, 0, 0, 2) - exh)^2/exh)

        0         1         2         3 
0.9651830 2.2301801 0.1991941 0.1833044

stat <- sum((numberSamples[-c(7, 6, 5)] + c(0, 0, 0, 2) - exh)^2/exh)
stat

[1] 3.577862

epvalue2 <- pchisq(stat, 4 - 1 - 1, lower = FALSE)
epvalue2

[1] 0.1671388

# Could use the following but the degrees of freedom will be incorrect for this test!
chisq.test(c(10, 24, 10, 6), p = psf)


    Chi-squared test for given probabilities

data:  c(10, 24, 10, 6)
X-squared = 3.5779, df = 3, p-value = 0.3108

Note the substantial drop in \(p\)-value (0.1671388) although the final conclusion is still the same.

For part (b), the null hypothesis is that the number of defectives in each sample of 13 follows the binomial distribution with \(n = 13\) and \(\pi\) equal the probability of a defective in any sample. An estimate of \(\pi\), \(\hat{\pi}\) is the total number of defectives (65) divided by the total number of observations (650). That is \(\hat{\pi} = 0.1\).

pihat <- 65/650
ps <- dbinom(0:2, 13, pihat) 
p3m <- 1 - pbinom(2, 13, pihat)
psf <- c(ps, p3m)
psf

[1] 0.2541866 0.3671584 0.2447723 0.1338828

exh <- psf*50
exh

[1] 12.709329 18.357920 12.238613  6.694138

((numberSamples[-c(7, 6, 5)] + c(0, 0, 0, 2) - exh)^2/exh)

         0          1          2          3 
0.57756506 1.73402370 0.40947362 0.07197749

stat <- sum((numberSamples[-c(7, 6, 5)] + c(0, 0, 0, 2)  - exh)^2/exh)
stat

[1] 2.79304

epvalue3 <- pchisq(stat, 4 - 1 - 1, lower = FALSE)
epvalue3

[1] 0.2474566

This example illustrates a common result with chi-square goodness-of-fit tests, i.e., that each to two (or more) different null hypotheses may be accepted for the same data set. Obviously, the true distribution cannot be both binomial and Poisson at the same time which is why the conclusion is that there is not sufficient evidence to suggest the alternative. This does not make the null hypothesis true!

Contingency Tables

Alan T. Arnholt

Last compiled: Sunday, October 30, 2016 - 16:09:33.

Test for Independence

Null and Alternative Hypotheses

`R` interlude

Permutation Test for Independence

Using base graphics

Using ggplot2 now

Formatting the Data

Example Conversion

Questions

Some code to answer Problem 10

Test of Homogeneity

Goodness of Fit

Contingency Tables

Alan T. Arnholt

Last compiled: Sunday, October 30, 2016 - 16:09:33.

General Social Survey

Test for Independence

Null and Alternative Hypotheses

R interlude

Permutation Test for Independence

Using base graphics

Using ggplot2 now

Formatting the Data

Example Conversion

Questions

Some code to answer Problem 10

Test of Homogeneity

Goodness of Fit

`R` interlude