African-American Achievements
plotly
interactive timeline, Wikipedia web scraping
Notable topics: plotly
interactive timeline, Wikipedia web scraping
Recorded on: 2020-06-08
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_reorder
from the forcats
package to reorder the category
factor levels by sorting along n
.
Use str_remove
from the stringr
package to remove anything after a bracket or parenthesis from the person
variable with the regular expression
"[\\[\\(].*"
David then discusses how web scraping may be a better option than parsing the strings.
Use str_trim
from the stringr
package to remove the whitespace
from the person
variable. David then discusses how web scraping may be a better option than parsing the strings.
Create an interactive plotly
timeline.
Use ylim(c(-.1, 1))
to set scale limits moving the geom_point
to the bottom of the graph.
Use paste0
from base R
to concatenate the accomplishment
and person
with ": "
in between the two displayed in the timeline hover label.
Set y
to category
in ggplot
aesthetics
to get 8 separate timelines on one plot, one for each category. Doing this allows David to remove the ylim
mentioned above.
Use the plotly
tooltip = text
parameter to get just a single line of text in the plotly
hover labels.
Use glue
from the glue
package to reformat text
with \n
included so that the single line of text can now be broken up into 2 separate lines in the hover labels.
Use separate_rows
from the tidyr
package to separate the occupation_s
variable from the science
dataset into multiple columns delimited by a semicolon with sep = "; "
Use str_to_title
from the stringr
package to conver the case to title case in the occupation_s
variable.
Use str_detect
from the stringr
package to detect the presence of statistician
from within the occupation_s
variable with regex("statistician", ignore_case = TRUE)
to perform a case-insensitive search.
Use the rvest
package with Selector Gadget
to scrape additional information about the individual from their Wikipedia
infobox.
Use map
and possibly
from the purrr
package to separate out the downloading of data from parsing the useful information. David then turns the infobox extraction step into an anonymous function
using .%>%
dot-pipe.
Summary of screencast.