Goal: Explore the Taskmaster data across all completed series. Which contestants did worst? Which contestants did best? Did the scoring change over the series?
library(rvest)
library(tidyverse)
tm <- read_html("https://taskmaster.fandom.com/wiki/Series_11")
tm
{html_document}
<html class="client-nojs" lang="en" dir="ltr">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
[2] <body class="mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-subject ...
Here is a basic HTML page:
<html>
: start of the HTML page<head>
: header information (metadata about the page)<body>
: everything that is on the page<p>
: paragraphs<b>
: bold<table>
: tableThe Taskmaster data we want looks like it is stored in a table. How can we extract it?
{xml_nodeset (4)}
[1] <table style="width: 100%; text-align: center; border: 1px solid #891100; ...
[2] <table class="toccolours" align="center" style="background: #891100; colo ...
[3] <table class="pi-horizontal-group">\n<caption class="pi-header pi-seconda ...
[4] <table class="tmtable"><tbody>\n<tr class="tmtableheader">\n<th>Task\n</t ...
html_elements
returns all the elements matching the selector.
{xml_nodeset (4)}
[1] <table style="width: 100%; text-align: center; border: 1px solid #891100; ...
[2] <table class="toccolours" align="center" style="background: #891100; colo ...
[3] <table class="pi-horizontal-group">\n<caption class="pi-header pi-seconda ...
[4] <table class="tmtable"><tbody>\n<tr class="tmtableheader">\n<th>Task\n</t ...
How do we know which table we want?
# A tibble: 75 × 7
Task Description `Charlotte Ritchie` `Jamali Maddix` `Lee Mack`
<chr> <chr> <chr> <chr> <chr>
1 Episode 1: It's n… Episode 1:… Episode 1: It's no… Episode 1: It'… Episode 1…
2 1 Prize: Bes… 1 2 4
3 2 Do the mos… 2 3[1] 3
4 3 Catch the … DQ 1 5
5 4 Deliver al… 2 1 5
6 5 Live: Stac… 0 0 0
7 Total Total 5 7 17
8 Episode 2: The Lu… Episode 2:… Episode 2: The Lur… Episode 2: The… Episode 2…
9 1 Prize: Bes… 5 1 2
10 2 Make the b… 0 5 0
# ℹ 65 more rows
# ℹ 2 more variables: `Mike Wozniak` <chr>, `Sarah Kendall` <chr>
# A tibble: 75 × 7
Task Description `Charlotte Ritchie` `Jamali Maddix` `Lee Mack`
<chr> <chr> <chr> <chr> <chr>
1 Episode 1: It's n… Episode 1:… Episode 1: It's no… Episode 1: It'… Episode 1…
2 1 Prize: Bes… 1 2 4
3 2 Do the mos… 2 3[1] 3
4 3 Catch the … DQ 1 5
5 4 Deliver al… 2 1 5
6 5 Live: Stac… 0 0 0
7 Total Total 5 7 17
8 Episode 2: The Lu… Episode 2:… Episode 2: The Lur… Episode 2: The… Episode 2…
9 1 Prize: Bes… 5 1 2
10 2 Make the b… 0 5 0
# ℹ 65 more rows
# ℹ 2 more variables: `Mike Wozniak` <chr>, `Sarah Kendall` <chr>
How would we scrape Charlotte Ritchie’s birthday?
read_html("https://taskmaster.fandom.com/wiki/Charlotte_Ritchie") |>
html_element("[data-source='born']")
{html_node}
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="born">
[1] <h3 class="pi-data-label pi-secondary-font">Born</h3>
[2] <div class="pi-data-value pi-font">29 August 1989</div>
{xml_nodeset (436)}
[1] <td>\n<table class="toccolours" align="center" style="background: #89110 ...
[2] <td>\n<a href="/wiki/Series_10" title="Series 10"><span style="color: #F ...
[3] <td align="center">\n<span style="font-family: Veteran Typewriter;"><a h ...
[4] <td class="pi-horizontal-group-item pi-data-value pi-font pi-border-colo ...
[5] <td class="pi-horizontal-group-item pi-data-value pi-font pi-border-colo ...
[6] <td colspan="7">Episode 1: <span style="font-family: Veteran Typewriter; ...
[7] <td>\n<a href="/wiki/Best_thing_you_can_carry,_but_only_just" title="Bes ...
[8] <td>\n<b>Prize:</b> Best thing you can carry, but only just.\n</td>
[9] <td>1\n</td>
[10] <td>2\n</td>
[11] <td>4\n</td>
[12] <td>\n<b>5</b>\n</td>
[13] <td>3\n</td>
[14] <td>\n<a href="/wiki/Do_the_most_impressive_thing_under_the_table_with_o ...
[15] <td>Do the most impressive thing under the table with one hand. You must ...
[16] <td>2\n</td>
[17] <td>3<sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</ ...
[18] <td>3\n</td>
[19] <td>\n<b>5</b>\n</td>
[20] <td>4\n</td>
...
read_html("https://taskmaster.fandom.com/wiki/Series_11") |>
html_elements("td[align='center'] > span")
{xml_nodeset (10)}
[1] <span style="font-family: Veteran Typewriter;"><a href="/wiki/It%27s_not ...
[2] <span style="font-family: Veteran Typewriter;"><a href="/wiki/The_Lure_o ...
[3] <span style="font-family: Veteran Typewriter;"><a href="/wiki/Run_up_a_t ...
[4] <span style="font-family: Veteran Typewriter;"><a href="/wiki/Premature_ ...
[5] <span style="font-family: Veteran Typewriter;"><a href="/wiki/Slap_and_t ...
[6] <span style="font-family: Veteran Typewriter;"><a href="/wiki/Absolute_c ...
[7] <span style="font-family: Veteran Typewriter;"><a href="/wiki/You%27ve_g ...
[8] <span style="font-family: Veteran Typewriter;"><a href="/wiki/An_orderly ...
[9] <span style="font-family: Veteran Typewriter;"><a href="/wiki/Mr_Octopus ...
[10] <span style="font-family: Veteran Typewriter;"><a href="/wiki/Activate_J ...
read_html("https://taskmaster.fandom.com/wiki/Series_11") |>
html_elements("td[align='center'] > span") |>
html_element("a")
{xml_nodeset (10)}
[1] <a href="/wiki/It%27s_not_your_fault." title="It's not your fault.">It's ...
[2] <a href="/wiki/The_Lure_of_the_Treacle_Puppies." title="The Lure of the ...
[3] <a href="/wiki/Run_up_a_tree_to_the_moon." title="Run up a tree to the m ...
[4] <a href="/wiki/Premature_conker." title="Premature conker.">Premature co ...
[5] <a href="/wiki/Slap_and_tong." title="Slap and tong.">Slap and tong.</a>
[6] <a href="/wiki/Absolute_casserole." title="Absolute casserole.">Absolute ...
[7] <a href="/wiki/You%27ve_got_no_chutzpah." title="You've got no chutzpah. ...
[8] <a href="/wiki/An_orderly_species." title="An orderly species.">An order ...
[9] <a href="/wiki/Mr_Octopus_and_Pottyhands." title="Mr Octopus and Pottyha ...
[10] <a href="/wiki/Activate_Jamali." title="Activate Jamali.">Activate Jamal ...
read_html("https://taskmaster.fandom.com/wiki/Series_11") |>
html_elements("td[align='center'] > span") |>
html_element("a") |>
html_attr("href")
[1] "/wiki/It%27s_not_your_fault."
[2] "/wiki/The_Lure_of_the_Treacle_Puppies."
[3] "/wiki/Run_up_a_tree_to_the_moon."
[4] "/wiki/Premature_conker."
[5] "/wiki/Slap_and_tong."
[6] "/wiki/Absolute_casserole."
[7] "/wiki/You%27ve_got_no_chutzpah."
[8] "/wiki/An_orderly_species."
[9] "/wiki/Mr_Octopus_and_Pottyhands."
[10] "/wiki/Activate_Jamali."
https://sta279-f23.github.io/class_activities/ca_lecture_19.html