Instructions: Work with a neighbor to answer the following questions. To get started, download the class activity template file.
In this activity, you will practice using dplyr
functions for data manipulation.
You will work with the flights
data from the nycflights13
package. Run the following code in your console to learn more about the flights
dataset:
(You may need to install the nycflights13
package first). You will also need to load the tidyverse
package.
Flights departing late will probably arrive late. We can look at the relationship between departure delay and arrival delay with a scatterplot:
One way to summarize the relationship between two quantitative variables is with the correlation, which measures the strength of the linear relationship between the two variables. Correlation is a number between -1 and 1; a correlation close to -1 indicates a strong negative relationship, while a correlation close to 1 indicates a strong positive relationship, and a correlation of 0 indicates no relationship.
use = "complete.obs"
means “Ignore the rows with NAs in either of the two variables”):group_by
function to group the data before calculating correlation. Fill in the code below to calculate the correlation for each airport (EWR, JFK, and LGA).flights |>
mutate(time_gained = ...) |>
group_by(...) |>
summarize(avg_time_gained = ...,
sd_time_gained = ...)
count
function!).