More Bootstrapping

library(tidyverse)
library(resampledata)
DT::datatable(TXBirths2004)

DF <- TXBirths2004 %>% 
  filter(Gender == "Male", Smoker == "No", Gestation < 37)
ggplot(data = DF, aes(x = Weight)) + 
  geom_density(fill = "purple", alpha = 0.5) + 
  theme_bw()

DF %>% 
  summarize(Mean = mean(Weight), SD = sd(Weight), n = n())

      Mean       SD  n
1 2507.553 729.4068 76

set.seed(58)
thetahat <- mean(DF$Weight)
thetahat

[1] 2507.553

SEthetahat <- sd(DF$Weight)/sqrt(sum(!is.na(DF$Weight)))
SEthetahat

[1] 83.6687

# T Bootstrap
sims <- 10^4
Tstar <- numeric(sims)
for(i in 1:sims){
  bs <- sample(DF$Weight, size = sum(!is.na(DF$Weight)), replace = TRUE)
  Tstar[i] <- (mean(bs) - thetahat)/(sd(bs)/sqrt(sum(!is.na(DF$Weight))))
}
ggplot(data = data.frame(Tstar = Tstar), aes(x = Tstar)) +
  geom_density(fill = "purple", alpha = 0.5) + 
  theme_bw()

TS <- quantile(Tstar, prob = c(0.025, 0.975))
TS

     2.5%     97.5% 
-1.870836  2.150859

# Compare to
qt(c(0.025, 0.975), sum(!is.na(DF$Weight)) - 1)

[1] -1.992102  1.992102

Note that \(Q_1 = t^*_{0.025} = -1.8708359\), and \(Q_2 = t^*_{0.975} = 2.1508594\). The bootstrap \(t\) confidence interval is \[CI_{1 - \alpha}(\mu)=[\bar{X} - Q_2\cdot S/\sqrt{n}, \bar{X} - Q_1\cdot S/\sqrt{n}]\] or \[CI_{1 - \alpha}(\mu)=[\bar{X} - t^*_{1 - \alpha/2}\cdot S/\sqrt{n}, \bar{X} - t^*_{\alpha/2}\cdot S/\sqrt{n}]\]

In this case, a 95% bootstrap \(t\) confidence interval is:

\[CI_{1 - \alpha}(\mu)=[2507.5526316 - 2.1508594\cdot 83.6687, 2507.5526316 - -1.8708359\cdot 83.6687]\]

\[CI_{1 - \alpha}(\mu)= [2327.5930179, 2664.0830354]\]

Compare to a standard \(t\) confidence interval:

t.test(DF$Weight)


    One Sample t-test

data:  DF$Weight
t = 29.97, df = 75, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 2340.876 2674.229
sample estimates:
mean of x 
 2507.553

Using package `boot`

library(boot)
mean.boot <- function(data, i){
  d <- data[i]
  M <- mean(d)
  V <- var(d)/length(i)
  return(c(M, V))
}
boot.out <- boot(DF$Weight, mean.boot, R = 10^4)
boot.ci(boot.out, conf = 0.95, type = c("perc", "stud"))

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot.out, conf = 0.95, type = c("perc", "stud"))

Intervals : 
Level    Studentized          Percentile     
95%   (2330, 2664 )   (2342, 2665 )  
Calculations and Intervals on Original Scale

More Bootstrapping

Alan T. Arnholt

Friday, October 27, 2023 - 11:20:51 AM.

Using package `boot`

More Bootstrapping

Alan T. Arnholt

Friday, October 27, 2023 - 11:20:51 AM.

Using package boot

Using package `boot`