Lecture 1: Intro to simulation

Warm-up question

Problem: 10 people are at a party, and all of them are wearing hats. They each place their hat in a pile; when they leave, they choose a hat at random. What is the probability at least one person selected the correct hat?

Question: Work with your neighbor to discuss the following question:

  • Without calculating probabilities, how could you design an experiment to estimate this probability?

Designing an experiment

Step 1: representing the hats

hats <- 1:10

hats
 [1]  1  2  3  4  5  6  7  8  9 10
hats[3]
[1] 3
  • hats is a vector, containing the numbers 1 to 10
  • entries in a vector are accessed by their index

Step 2: everyone draws a random hat

hats <- 1:10
randomized_hats <- sample(hats, size = 10, replace = FALSE)

hats
 [1]  1  2  3  4  5  6  7  8  9 10
randomized_hats
 [1]  8  2  4  5  9  7  1 10  3  6
  • The sample function creates a random sample from a vector
  • How many people selected their original hat?

Step 3: check who got their original hat

hats <- 1:10
randomized_hats <- sample(hats, size = 10, replace = FALSE)
hats
 [1]  1  2  3  4  5  6  7  8  9 10
randomized_hats
 [1]  8  2  4  5  9  7  1 10  3  6
hats == randomized_hats
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# TRUE is 1, FALSE is 0
sum(hats == randomized_hats)
[1] 1
# did at least one person get their hat?
sum(hats == randomized_hats) > 0
[1] TRUE

Code so far

hats <- 1:10
randomized_hats <- sample(hats, size = 10, replace = FALSE)
sum(hats == randomized_hats) > 0
[1] TRUE
  • Is this a good estimate of the probability?

Step 4: iteration

A for loop repeats code many times:

nsim <- 10000 # number of simulations 
for(i in 1:nsim){
  
  
}

Step 4: iteration

A for loop repeats code many times:

nsim <- 10000 # number of simulations
hats <- 1:10
results <- rep(NA, nsim) # vector to store results

for(i in 1:nsim){
  randomized_hats <- sample(hats, size = 10, replace = FALSE)
  results[i] <- sum(hats == randomized_hats) > 0
}

head(results)
[1] FALSE FALSE  TRUE FALSE FALSE FALSE

Step 4: iteration

A for loop repeats code many times:

nsim <- 10000 # number of simulations
hats <- 1:10
results <- rep(NA, nsim) # vector to store results

for(i in 1:nsim){
  randomized_hats <- sample(hats, size = 10, replace = FALSE)
  results[i] <- sum(hats == randomized_hats) > 0
}

mean(results)
[1] 0.6373
  • What if I wanted to repeat the simulation, with a different number of people?

Removing magic numbers

Without magic numbers:

nsim <- 10000 # number of simulations
M <- 10 # number of people
hats <- 1:M
results <- rep(NA, nsim) # vector to store results

for(i in 1:nsim){
  randomized_hats <- sample(hats, 
                            size = M, 
                            replace = FALSE)
  results[i] <- sum(hats == 
                      randomized_hats) > 0
}

mean(results)
[1] 0.6285
  • Why did I get different results?

Final code

set.seed(3) # set a seed for reproducibility

M <- 10 # number of people at the party
hats <- 1:M # numbered hats
nsim <- 10000 # number of simulations
results <- rep(NA, nsim) # vector to store the results

for(i in 1:nsim){
  # hats are randomly assigned to each person
  randomized_hats <- sample(hats, M, replace = F)
  
  # did at least one person get their hat back?
  results[i] <- sum(randomized_hats == hats) > 0
}

mean(results)

Summary of coding practices

  • avoid magic numbers
  • set a seed for reproducibility
  • use meaningful names
  • add comments

Class activity

Work with a neighbor on the class activity (link below and on the course website):

https://sta279-f23.github.io/class_activities/ca_lecture_1.html