US PhDs
Data cleaning (getting messy data into tidy format)
Notable topics: Data cleaning (getting messy data into tidy format)
Recorded on: 2019-02-21
Timestamps by: Alex Cookson
Screencast
Timestamps
Using read_xlsx function to read in Excel spreadsheet, including skipping first few rows that don't have data
Overview of starting very messy data
Using gather function to clean up wide dataset
Using fill function to fill in NA values with a entries in a previous observation
Cleaning variable that has number and percent in it, on top of one another using a combination of ifelse and fill functions
Using spread function on cleaned data to separate number and percent by year
Spotted a mistake where he had the wrong string on str_detect function
Using sample function to get 6 random fields of study to graph
Cleaning another dataset, which is much easier to clean
Renaming the first field, even without knowing the exact name
Cleaning another dataset
Discussing challenge of when indentation is used in original dataset (for group / sub-group distinction)
Starting to separate out data that is appended to one another in the original dataset (all, male, female)
Removing field with long name using contains function
Using fct_recode function to rename an oddly-named category in a categorical variable (ifelse function is probably a better alternative)
Discussing solution to broad major field description and fine major field description (meaningfully indented in original data)
Using setdiff function to separate broad and fine major fields