Processing math: 100%

Class Activity

Instruction: Work with a neighbor to answer the following questions, then we will discuss the activity as a class. To get started, download the class activity template file.

Simulation: the normality assumption in linear regression

In a previous class, we started a simulation to assess how important the normality assumption is in the simple linear regression model

Yi=β0+β1Xi+εi

That is, how important is the assumption that εiN(0,σ2)?

So far, we have written the following code to simulate data for which the normality assumption is satisfied:

nsim <- 1000
n <- 100 # sample size
beta0 <- 0.5 # intercept
beta1 <- 1 # slope
results <- rep(NA, nsim)

for(i in 1:nsim){
  x <- runif(n, min=0, max=1)
  noise <- rnorm(n, mean=0, sd=1)
  y <- beta0 + beta1*x + noise

  lm_mod <- lm(y ~ x)
  ci <- confint(lm_mod, "x", level = 0.95)
  
  results[i] <- ci[1] < 1 & ci[2] > 1
}
mean(results)

In particular, the line noise <- rnorm(n, mean=0, sd=1) ensures the errors come from a normal distribution. Running this code, the coverage of our confidence intervals is approximately 95% (as expected).

Now, we want to know how important it is that the errors εi be normal. To address that question, we need to see what happens when εi comes from a different distribution! We can simulate from many different distributions in R, including the following:

Questions

  1. Experiment with different distributions for the noise term εi in the code above. How does the confidence interval coverage change?

  2. Does confidence interval coverage depend on the sample size n?