Class activity

Airlines data

The airlines SQL database contains information on 48 million flights from the Bureau of Transportation Statistics (you have worked with a small subset of this data in the nycflights13 package). Information from the database can be obtained through SQL queries. For example, the flights contains the following tables:

SHOW TABLES;

4 records

Tables_in_airlines
airports
carriers
flights
planes

Connecting to the database

Make sure the mdsr and DBI packages are installed, then run the following code in R:

library(tidyverse)
library(mdsr)
library(DBI)
db <- dbConnect_scidb("airlines")

Let’s try a quick query to see that it works:

query <- "
SELECT name FROM carriers LIMIT 0,5;
"
dbGetQuery(db, query)
##                            name
## 1                 Titan Airways
## 2            Tradewind Aviation
## 3           Comlux Aviation, AG
## 4 Master Top Linhas Aereas Ltd.
## 5           Flair Airlines Ltd.

Here, we are querying SQL through R. We store the query in a string, and then use the dbGetQuery function to send that query to the database and return the results.

  1. Fill in the query below to select the tailnum and year columns from the planes table. For safety, we are only going to return the first 5 rows.
query <- "
SELECT ... FROM ... LIMIT 0,5;
"
dbGetQuery(db, query)
  1. Using the MIN() function, find the oldest date of manufacture (year) in the planes table.
query <- "
SELECT ... FROM ... LIMIT 0,5;
"
dbGetQuery(db, query)
  1. Using WHERE, find the tailnum of the oldest plane in the planes table.
query <- "
SELECT ... 
FROM ... 
WHERE ...
LIMIT 0,5;
"
dbGetQuery(db, query)

Questions 2 and 3 required two separate queries: first to find the oldest plane, then to find the tail number. We can be more efficient by using a subquery to choose only the years equal to the smallest year

query <- "
SELECT tailnum, year 
FROM planes 
WHERE year = (SELECT MIN(year) FROM planes)
LIMIT 0,5;
"
dbGetQuery(db, query)
##   tailnum year
## 1  N381AA 1956
  1. Modify the above code to find the newest manufacturing date, and calculate the number of planes which share the newest manufacturing date. You can use SUM(1) or COUNT(...) to count rows.