Class Activity

Instructions: Work with a neighbor to answer the following questions. To get started, download the class activity template file.

In this activity, you will practice using dplyr functions for data manipulation.

Data

You will work with the flights data from the nycflights13 package. Run the following code in your console to learn more about the flights dataset:

library(nycflights13)
?flights

(You may need to install the nycflights13 package first). You will also need to load the tidyverse package.

Questions

Flights departing late will probably arrive late. We can look at the relationship between departure delay and arrival delay with a scatterplot:

One way to summarize the relationship between two quantitative variables is with the correlation, which measures the strength of the linear relationship between the two variables. Correlation is a number between -1 and 1; a correlation close to -1 indicates a strong negative relationship, while a correlation close to 1 indicates a strong positive relationship, and a correlation of 0 indicates no relationship.

  1. In R, we calculate correlation with the cor function. Fill in the code below to calculate the correlation between departure delay and arrival delay (use = "complete.obs" means “Ignore the rows with NAs in either of the two variables”):
flights |>
  summarize(delay_cor = cor(..., ..., use = "complete.obs"))
  1. Does the correlation between departure delay and arrival delay depend on which airport the flight departs from? We can use the group_by function to group the data before calculating correlation. Fill in the code below to calculate the correlation for each airport (EWR, JFK, and LGA).
flights |>
  group_by(...) |>
  summarize(delay_cor = cor(..., ..., use = "complete.obs"))
  1. How does the amount of time gained vary across airlines? Fill in the following code to calculate the average time gained and the standard deviation of time gained for each airline.
flights |>
  mutate(time_gained = ...) |>
  group_by(...) |>
  summarize(avg_time_gained = ...,
            sd_time_gained = ...)
  1. Now let’s look more at the different airlines. Which airport is the most common departure airport for American Airlines (AA) flights? Fill in the following code (you may need to look up documentation for the count function!).
flights |>
  filter(... == "AA") |>
  count(...)
  1. Occasionally, flights actually depart early. How many American Airlines flights departed early?
flights |>
  filter(carrier ..., dep_delay ...) |>
  count()