Wine Ratings
Text mining (tidytext package), LASSO regression (glmnet package)
Notable topics: Text mining (tidytext package), LASSO regression (glmnet package)
Recorded on: 2019-05-30
Timestamps by: Alex Cookson
Screencast
Timestamps
Using extract function from tidyr package to pull out year from text field
Changing extract function to pull out year column more accurately
Starting to explore prediction of points
Using fct_lump on country variable to collapse countries into an "Other" category, then fct_relevel to set the baseline category for a linear model
Investigating year as a potential confounding variable
Investigating "taster_name" as a potential confounding variable
Coefficient (TIE fighter) plot to see effect size of terms in a linear model, using tidy function from broom package
Polishing category names for presentation in graph using str_replace function
Using augment function to add predictions of linear model to original data
Plotting predicted points vs. actual points
Using ANOVA to determine the amount of variation that explained by different terms
Using tidytext package to set up wine review text for Lasso regression
Setting up and using pairwise_cor function to look at words that appear in reviews together
Creating sparse matrix using cast_sparse function from tidytext package; used to perform a regression on positive/negative words
Checking if rownames of sparse matrix correspond to the wine_id values they represent
Setting up sparse matrix for using glmnet package to do sparse regression using Lasso method
Actually writing code for doing Lasso regression
Basic explanation of Lasso regression
Putting Lasso model into tidy format
Explaining how the number of terms increases as lambda (penalty parameter) decreases
Answering how we choose a lambda value (penalty parameter) for Lasso regression
Using parallelization for intensive computations
Adding price (from original linear model) to Lasso regression
Shows glmnet.fit piece of a Lasso (glmnet) model
Picking a lambda value (penalty parameter) and explaining which one to pick
Taking most extreme coefficients (positive and negative) by grouping theme by direction
Demonstrating tidytext package's sentiment lexicon, then looking at individual reviews to demonstrate the model
Visualizing each coefficient's effect on a single review
Using str_trunc to truncate character strings