HBCU Enrollment
Data Cleaning
Notable topics: Data Cleaning
Recorded on: 2021-02-01
Timestamps by: Eric Fletcher
Screencast
Timestamps
Detect the presence or absence of a pattern in a string.
Separate a character column into multiple columns with a regular expression or numeric locations
Rename column.
Select only unique/distinct rows from a data frame.
Expand the y axis plot limits by starting at 0.
Combine two datasets while including all rows in x and y.
Y axis labels as percentages (2.5%, 50%, etc).
Bind multiple data frames by row and an explanation as to why it's not the best approach for joining given the other options.
Brief discussion on the differences between rbind
and row_bind
.
Remove matched patterns in a string.
Turn variable names into 'snake case' (e.g. Standard Error, standard_error).
Mutate multiple columns to change type from character
to numeric
while parsing out the numbers while getting rid of the other characters in the dataset.
Subset rows using their positions.
Reshape the data from wide to long such that there is one row for each year and race.
Compute the absolute value of x
Remove matched patterns in a string (e.g. black1, black & white1, white).
Reorder factor levels in geom_line
plot by sorting along another variable.
Bind multiple data frames by row.
Reorder factor levels by hand.
Detect and remove the presence of a pattern in a string to remove duplication from geom_line
plot legend.
"Reorder factor levels in geom_line
plot by sorting along another variable with ordering based on the last value to make the data line up with how the values are displayed in the legend. 'fct_reorder(race_ethnicity, percent, last, .desc = TRUE)`"
Import external Excel data set from Data.World
.
Select variables that match a pattern to remove.
Unpack data in one column (field_gender) into two separate columns (field, gender).
Summary of screencast.
NA