Horror Movies
ANOVA, Text mining (tidytext package), LASSO regression (glmnet package)
Notable topics: ANOVA, Text mining (tidytext package), LASSO regression (glmnet package)
Recorded on: 2019-10-21
Timestamps by: Alex Cookson
Screencast
Timestamps
Extracting digits (release year) from character string using regex, along with good explanation of extract function
Quick check on why parse_number is unable to parse some values -- is it because they are NA or some other reason?
Visually investigating correlation between budget and rating
Investigating correlation between MPAA rating (PG-13, R, etc.) and rating using boxplots
Using pull function to quickly check levels of a factor
Using ANOVA to check difference of variation within groups (MPAA rating) than between groups
Separating genre using separate_rows function (instead of str_split and unnest)
Removing boilerplate "Directed by..." and "With..." part of plot variable and isolating plot, first using regex, then by using separate function with periods as separator
Unnesting word tokens, removing stop words, and counting appearances
Aggregating by word to find words that appear in high- or low-rated movies
Discussing potential confounding factors for ratings associated with specific words
Searching for duplicated movie titles
De-duping using distinct function
Loading in and explaining glmnet package
Using movie titles to pull out ratings using rownmaes and match functions to create an index of which rating to pull out of the original dataset
Actually using glmnet function to create lasso model
Showing built-in plot of lasso lambda against mean-squared error
Explaining when certain terms appeared in the lasso model as the lambda value dropped
Gathering all variables except for title, so that the dataset is very tall
Using unite function to combine two variables (better alternative to paste)
Creating a new lasso with tons of new variables other than plot words