Instructions: Work with a neighbor to answer the following questions. To get started, download the class activity template file.

GDP data

Run the following code to import data on GDP for each country (you may need to install the googlesheets4 package):

library(tidyverse)
library(googlesheets4)
gs4_deauth()
gdp_data <- read_sheet("https://docs.google.com/spreadsheets/d/1RctTQmKB0hzbm1E8rGcufYdMshRdhmYdeL29nXqmvsc/pub?gid=0")

Looking at the data, we can see that there is a column for the GDP in each year, and the first column contains country information:

head(gdp_data)

## # A tibble: 6 × 53
##   Income per person (f…¹ `1960` `1961` `1962` `1963` `1964` `1965` `1966` `1967`
##   <chr>                   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Abkhazia                  NA     NA     NA     NA     NA     NA     NA     NA 
## 2 Afghanistan               NA     NA     NA     NA     NA     NA     NA     NA 
## 3 Akrotiri and Dhekelia     NA     NA     NA     NA     NA     NA     NA     NA 
## 4 Albania                   NA     NA     NA     NA     NA     NA     NA     NA 
## 5 Algeria                 1280.  1085.   856.  1128.  1170.  1215.  1128.  1201.
## 6 American Samoa            NA     NA     NA     NA     NA     NA     NA     NA 
## # ℹ abbreviated name: ¹`Income per person (fixed 2000 US$)`
## # ℹ 44 more variables: `1968` <dbl>, `1969` <dbl>, `1970` <dbl>, `1971` <dbl>,
## #   `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>,
## #   `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>, `1981` <dbl>,
## #   `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>,
## #   `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>,
## #   `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, …

Use the pivot_longer function to reshape the GDP data so it looks like this:

## # A tibble: 7,988 × 3
##    country year    gdp
##    <chr>   <chr> <dbl>
##  1 Albania 1980  1061.
##  2 Albania 1981  1100.
##  3 Albania 1982  1111.
##  4 Albania 1983  1101.
##  5 Albania 1984  1065.
##  6 Albania 1985  1060.
##  7 Albania 1986  1092.
##  8 Albania 1987  1054.
##  9 Albania 1988  1014.
## 10 Albania 1989  1092.
## # ℹ 7,978 more rows

There should be no NAs
The first column should be renamed country
There should be separate columns for year and gdp

WHO data

The tidyr package contains a dataset called who2 which contains information on tuberculosis counts for each country in each year, broken down by method of diagnosis, sex, and age group:

who2

## # A tibble: 7,240 × 58
##    country      year sp_m_014 sp_m_1524 sp_m_2534 sp_m_3544 sp_m_4554 sp_m_5564
##    <chr>       <dbl>    <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
##  1 Afghanistan  1980       NA        NA        NA        NA        NA        NA
##  2 Afghanistan  1981       NA        NA        NA        NA        NA        NA
##  3 Afghanistan  1982       NA        NA        NA        NA        NA        NA
##  4 Afghanistan  1983       NA        NA        NA        NA        NA        NA
##  5 Afghanistan  1984       NA        NA        NA        NA        NA        NA
##  6 Afghanistan  1985       NA        NA        NA        NA        NA        NA
##  7 Afghanistan  1986       NA        NA        NA        NA        NA        NA
##  8 Afghanistan  1987       NA        NA        NA        NA        NA        NA
##  9 Afghanistan  1988       NA        NA        NA        NA        NA        NA
## 10 Afghanistan  1989       NA        NA        NA        NA        NA        NA
## # ℹ 7,230 more rows
## # ℹ 50 more variables: sp_m_65 <dbl>, sp_f_014 <dbl>, sp_f_1524 <dbl>,
## #   sp_f_2534 <dbl>, sp_f_3544 <dbl>, sp_f_4554 <dbl>, sp_f_5564 <dbl>,
## #   sp_f_65 <dbl>, sn_m_014 <dbl>, sn_m_1524 <dbl>, sn_m_2534 <dbl>,
## #   sn_m_3544 <dbl>, sn_m_4554 <dbl>, sn_m_5564 <dbl>, sn_m_65 <dbl>,
## #   sn_f_014 <dbl>, sn_f_1524 <dbl>, sn_f_2534 <dbl>, sn_f_3544 <dbl>,
## #   sn_f_4554 <dbl>, sn_f_5564 <dbl>, sn_f_65 <dbl>, ep_m_014 <dbl>, …

Columns like sp_m_1524 contain the number of tuberculosis cases. The column names contain information about the method of diagnosis (rel, sn, sp, or ep), sex (f or m), and age group (014 = 0-14 years, 1524 = 15-24 years, etc.).

Ideally, we would like these variables to each have their own column.

Reshape the who2 data so it looks like this:

## # A tibble: 76,046 × 6
##    country      year diagnosis sex   age_group count
##    <chr>       <dbl> <chr>     <chr> <chr>     <dbl>
##  1 Afghanistan  1997 sp        m     014           0
##  2 Afghanistan  1997 sp        m     1524         10
##  3 Afghanistan  1997 sp        m     2534          6
##  4 Afghanistan  1997 sp        m     3544          3
##  5 Afghanistan  1997 sp        m     4554          5
##  6 Afghanistan  1997 sp        m     5564          2
##  7 Afghanistan  1997 sp        m     65            0
##  8 Afghanistan  1997 sp        f     014           5
##  9 Afghanistan  1997 sp        f     1524         38
## 10 Afghanistan  1997 sp        f     2534         36
## # ℹ 76,036 more rows

There should be no NAs
There should be separate columns for diagnosis, sex, and age_group
The count column contains the number of cases

household data

The household data from the tidyr package contains information about five different households and the children in each household:

household

## # A tibble: 5 × 5
##   family dob_child1 dob_child2 name_child1 name_child2
##    <int> <date>     <date>     <chr>       <chr>      
## 1      1 1998-11-26 2000-01-29 Susan       Jose       
## 2      2 1996-06-22 NA         Mark        <NA>       
## 3      3 2002-07-11 2004-04-05 Sam         Seth       
## 4      4 2004-10-10 2009-08-27 Craig       Khai       
## 5      5 2000-12-05 2005-02-28 Parker      Gracie

Using ".value" in the names_to argument of the pivot_longer function, reshape the household data to look like this:

## # A tibble: 9 × 4
##   family child  dob        name  
##    <int> <chr>  <date>     <chr> 
## 1      1 child1 1998-11-26 Susan 
## 2      1 child2 2000-01-29 Jose  
## 3      2 child1 1996-06-22 Mark  
## 4      3 child1 2002-07-11 Sam   
## 5      3 child2 2004-04-05 Seth  
## 6      4 child1 2004-10-10 Craig 
## 7      4 child2 2009-08-27 Khai  
## 8      5 child1 2000-12-05 Parker
## 9      5 child2 2005-02-28 Gracie

Class Activity

GDP data

WHO data

household data