Class activity solutions, August 28

Author

Ciaran Evans

The birthday problem

  1. Creating a vector to store the days of the year (useful for sampling later):
days <- 1:365
  1. Choosing birthdays:
set.seed(33)
n_students <- 30
birthdays <- sample(days, n_students, replace=TRUE)
  1. Are there 30 unique birthdays, or do we have a repeated birthday?
length(unique(birthdays)) < n_students
[1] TRUE

We have at least one repeated birthday!

  1. Now let’s repeat the simulation many times:
set.seed(33)

days <- 1:365 # days of the year
n_students <- 30

nsim <- 10000
results <- rep(NA, nsim) # store the simulation results
for(i in 1:nsim){
  birthdays <- sample(days, n_students, replace=TRUE)
  results[i] <- length(unique(birthdays)) < n_students
}

mean(results)
[1] 0.7077

The probability of at least one shared birthday is approximately 71%.

  1. How many students do we need for the probability to be approximately 50%? The answer is 23:
set.seed(213)

days <- 1:365 # days of the year
n_students <- 23

nsim <- 10000
results <- rep(NA, nsim) # store the simulation results
for(i in 1:nsim){
  birthdays <- sample(days, n_students, replace=TRUE)
  results[i] <- length(unique(birthdays)) < n_students
}

mean(results)
[1] 0.5076