\[Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\]
How would you study the importance of the normality assumption?
To start, simulate data for which the normality assumption holds:
runif(n, min=0, ,max=1)
samples \(X_i\) uniformly between 0 and 1rnorm(n, mean=0, sd=1)
samples \(\varepsilon_i \sim N(0, 1)\)nsim <- 1000
n <- 100 # sample size
beta0 <- 0.5 # intercept
beta1 <- 1 # slope
results <- rep(NA, nsim)
for(i in 1:nsim){
x <- runif(n, min=0, max=1)
noise <- rnorm(n, mean=0, sd=1)
y <- beta0 + beta1*x + noise
lm_mod <- lm(y ~ x)
ci <- confint(lm_mod, "x", level = 0.95)
results[i] <- ci[1] < 1 & ci[2] > 1
}
mean(results)
nsim <- 1000
n <- 100 # sample size
beta0 <- 0.5 # intercept
beta1 <- 1 # slope
results <- rep(NA, nsim)
for(i in 1:nsim){
x <- runif(n, min=0, max=1)
noise <- rnorm(n, mean=0, sd=1)
y <- beta0 + beta1*x + noise
lm_mod <- lm(y ~ x)
ci <- confint(lm_mod, "x", level = 0.95)
results[i] <- ci[1] < 1 & ci[2] > 1
}
mean(results)
[1] 0.952
\[Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\]
That is, how important is the assumption that \(\varepsilon_i \sim N(0, \sigma^2)\)?
Continue simulation from last time, but experiment with different values of \(n\) and different distributions for the noise term.
https://sta279-f23.github.io/class_activities/ca_lecture_3.html
\[Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\]
How does confidence interval coverage change when you change the distribution of \(\varepsilon_i\)?
nsim <- 1000
n <- 100 # sample size
beta0 <- 0.5 # intercept
beta1 <- 1 # slope
results <- rep(NA, nsim)
for(i in 1:nsim){
x <- runif(n, min=0, max=1)
noise <- rchisq(n, 1)
y <- beta0 + beta1*x + noise
lm_mod <- lm(y ~ x)
ci <- confint(lm_mod, "x", level = 0.95)
results[i] <- ci[1] < 1 & ci[2] > 1
}
mean(results)
[1] 0.963
For the normal errors simulation study: