Instructions: Work with a neighbor to answer the following questions. To get started, download the class activity template file.
In this activity, you will practice using dplyr functions for data manipulation.
You will work with the flights data from the nycflights13 package. Run the following code in your console to learn more about the flights dataset:
(You may need to install the nycflights13 package first). You will also need to load the tidyverse package.
Flights departing late will probably arrive late. We can look at the relationship between departure delay and arrival delay with a scatterplot:
One way to summarize the relationship between two quantitative variables is with the correlation, which measures the strength of the linear relationship between the two variables. Correlation is a number between -1 and 1; a correlation close to -1 indicates a strong negative relationship, while a correlation close to 1 indicates a strong positive relationship, and a correlation of 0 indicates no relationship.
use = "complete.obs" means “Ignore the rows with NAs in either of the two variables”):group_by function to group the data before calculating correlation. Fill in the code below to calculate the correlation for each airport (EWR, JFK, and LGA).flights |>
  mutate(time_gained = ...) |>
  group_by(...) |>
  summarize(avg_time_gained = ...,
            sd_time_gained = ...)count function!).