library(tidyverse)
library(resampledata)
DT::datatable(TXBirths2004)
DF <- TXBirths2004 %>%
filter(Gender == "Male", Smoker == "No", Gestation < 37)
ggplot(data = DF, aes(x = Weight)) +
geom_density(fill = "purple", alpha = 0.5) +
theme_bw()
DF %>%
summarize(Mean = mean(Weight), SD = sd(Weight), n = n())
Mean SD n
1 2507.553 729.4068 76
set.seed(58)
thetahat <- mean(DF$Weight)
thetahat
[1] 2507.553
SEthetahat <- sd(DF$Weight)/sqrt(sum(!is.na(DF$Weight)))
SEthetahat
[1] 83.6687
# T Bootstrap
sims <- 10^4
Tstar <- numeric(sims)
for(i in 1:sims){
bs <- sample(DF$Weight, size = sum(!is.na(DF$Weight)), replace = TRUE)
Tstar[i] <- (mean(bs) - thetahat)/(sd(bs)/sqrt(sum(!is.na(DF$Weight))))
}
ggplot(data = data.frame(Tstar = Tstar), aes(x = Tstar)) +
geom_density(fill = "purple", alpha = 0.5) +
theme_bw()
TS <- quantile(Tstar, prob = c(0.025, 0.975))
TS
2.5% 97.5%
-1.870836 2.150859
# Compare to
qt(c(0.025, 0.975), sum(!is.na(DF$Weight)) - 1)
[1] -1.992102 1.992102
Note that \(Q_1 = t^*_{0.025} = -1.8708359\), and \(Q_2 = t^*_{0.975} = 2.1508594\). The bootstrap \(t\) confidence interval is \[CI_{1 - \alpha}(\mu)=[\bar{X} - Q_2\cdot S/\sqrt{n}, \bar{X} - Q_1\cdot S/\sqrt{n}]\] or \[CI_{1 - \alpha}(\mu)=[\bar{X} - t^*_{1 - \alpha/2}\cdot S/\sqrt{n}, \bar{X} - t^*_{\alpha/2}\cdot S/\sqrt{n}]\]
In this case, a 95% bootstrap \(t\) confidence interval is:
\[CI_{1 - \alpha}(\mu)=[2507.5526316 - 2.1508594\cdot 83.6687, 2507.5526316 - -1.8708359\cdot 83.6687]\]
\[CI_{1 - \alpha}(\mu)= [2327.5930179, 2664.0830354]\]
Compare to a standard \(t\) confidence interval:
t.test(DF$Weight)
One Sample t-test
data: DF$Weight
t = 29.97, df = 75, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
2340.876 2674.229
sample estimates:
mean of x
2507.553
boot
library(boot)
mean.boot <- function(data, i){
d <- data[i]
M <- mean(d)
V <- var(d)/length(i)
return(c(M, V))
}
boot.out <- boot(DF$Weight, mean.boot, R = 10^4)
boot.ci(boot.out, conf = 0.95, type = c("perc", "stud"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = boot.out, conf = 0.95, type = c("perc", "stud"))
Intervals :
Level Studentized Percentile
95% (2330, 2664 ) (2342, 2665 )
Calculations and Intervals on Original Scale