Warm-up

Airlines data

The airlines SQL database contains information on 48 million flights from the Bureau of Transportation Statistics (you have worked with a small subset of this data in the nycflights13 package). Information from the database can be obtained through SQL queries. For example, the flights contains the following tables:

SHOW TABLES;

4 records

Tables_in_airlines
airports
carriers
flights
planes

An example query

Below is an SQL query on the database, and the output of the query. Working with your neighbor, try to determine what each piece of the query is doing. What would be the equivalent dplyr code?

SELECT 
  name,
  SUM(1) AS N, 
  SUM(arr_delay <= 15) / SUM(1) AS pct_ontime
FROM flights
JOIN carriers ON flights.carrier = carriers.carrier
WHERE year = 2016 AND month = 9
  AND dest = 'JFK'
GROUP BY name 
HAVING N >= 100
ORDER BY pct_ontime DESC
LIMIT 0,4;

4 records

name N pct_ontime
Delta Air Lines Inc. 2396 0.8689
Virgin America 347 0.8329
JetBlue Airways 3463 0.8169
American Airlines Inc. 1397 0.7817