The airlines
SQL database contains information on 48 million flights from the Bureau of Transportation Statistics (you have worked with a small subset of this data in the nycflights13
package). Information from the database can be obtained through SQL queries. For example, the flights contains the following tables:
4 records
Tables_in_airlines |
---|
airports |
carriers |
flights |
planes |
Below is an SQL query on the database, and the output of the query. Working with your neighbor, try to determine what each piece of the query is doing. What would be the equivalent dplyr
code?
SELECT
name,
SUM(1) AS N,
SUM(arr_delay <= 15) / SUM(1) AS pct_ontime
FROM flights
JOIN carriers ON flights.carrier = carriers.carrier
WHERE year = 2016 AND month = 9
AND dest = 'JFK'
GROUP BY name
HAVING N >= 100
ORDER BY pct_ontime DESC
LIMIT 0,4;
4 records
name | N | pct_ontime |
---|---|---|
Delta Air Lines Inc. | 2396 | 0.8689 |
Virgin America | 347 | 0.8329 |
JetBlue Airways | 3463 | 0.8169 |
American Airlines Inc. | 1397 | 0.7817 |