African-American Achievements

plotly interactive timeline, Wikipedia web scraping

Published

June 8, 2020

Notable topics: plotly interactive timeline, Wikipedia web scraping

Recorded on: 2020-06-08

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

fct_reorder

forcats

Use fct_reorder from the forcats package to reorder the category factor levels by sorting along n.

str_remove

stringr

Use str_remove from the stringr package to remove anything after a bracket or parenthesis from the person variable with the regular expression "[\\[\\(].*" David then discusses how web scraping may be a better option than parsing the strings.

str_trim

stringr

Use str_trim from the stringr package to remove the whitespace from the person variable. David then discusses how web scraping may be a better option than parsing the strings.

ggplotly

plotly

Create an interactive plotly timeline.

ylim

ggplot2

Use ylim(c(-.1, 1)) to set scale limits moving the geom_point to the bottom of the graph.

paste0

base

Use paste0 from base R to concatenate the accomplishment and person with ": " in between the two displayed in the timeline hover label.

aes

ggplot2

Set y to category in ggplot aesthetics to get 8 separate timelines on one plot, one for each category. Doing this allows David to remove the ylim mentioned above.

tooltip

plotly

Use the plotly tooltip = text parameter to get just a single line of text in the plotly hover labels.

glue

Use glue from the glue package to reformat text with \n included so that the single line of text can now be broken up into 2 separate lines in the hover labels.

separate_rows

tidyr

Use separate_rows from the tidyr package to separate the occupation_s variable from the science dataset into multiple columns delimited by a semicolon with sep = "; "

str_to_title

stringr

Use str_to_title from the stringr package to conver the case to title case in the occupation_s variable.

str_detect

stringr

Use str_detect from the stringr package to detect the presence of statistician from within the occupation_s variable with regex("statistician", ignore_case = TRUE) to perform a case-insensitive search.

read_htmlhtml_nodeshtml_tablesetNames

rvest

Use the rvest package with Selector Gadget to scrape additional information about the individual from their Wikipedia infobox.

mappossiblyread_html

purrr

Use map and possibly from the purrr package to separate out the downloading of data from parsing the useful information. David then turns the infobox extraction step into an anonymous function using .%>% dot-pipe.

Summary of screencast.

Screencast

Timestamps

0:8:20

0:11:35

0:12:25

0:15:50

0:18:20

0:19:30

0:20:30

0:22:25

0:26:05

0:33:55

0:34:25

0:35:15

0:41:55

0:49:15

0:58:40