Instruction: Work with a neighbor to answer the following questions, then we will discuss the activity as a class. To get started, download the class activity template file.
In a previous class, we started a simulation to assess how important the normality assumption is in the simple linear regression model
Yi=β0+β1Xi+εi
That is, how important is the assumption that εi∼N(0,σ2)?
So far, we have written the following code to simulate data for which the normality assumption is satisfied:
nsim <- 1000
n <- 100 # sample size
beta0 <- 0.5 # intercept
beta1 <- 1 # slope
results <- rep(NA, nsim)
for(i in 1:nsim){
x <- runif(n, min=0, max=1)
noise <- rnorm(n, mean=0, sd=1)
y <- beta0 + beta1*x + noise
lm_mod <- lm(y ~ x)
ci <- confint(lm_mod, "x", level = 0.95)
results[i] <- ci[1] < 1 & ci[2] > 1
}
mean(results)
In particular, the line noise <- rnorm(n, mean=0, sd=1)
ensures the errors come from a normal distribution. Running this code, the coverage of our confidence intervals is approximately 95% (as expected).
Now, we want to know how important it is that the errors εi be normal. To address that question, we need to see what happens when εi comes from a different distribution! We can simulate from many different distributions in R, including the following:
rt(...)
rexp(...)
rchisq(...)
rgamma(...)
runif(...)
Experiment with different distributions for the noise term εi in the code above. How does the confidence interval coverage change?
Does confidence interval coverage depend on the sample size n?