Medium Articles
Text mining (tidytext package)
Notable topics: Text mining (tidytext package)
Recorded on: 2018-12-03
Timestamps by: Alex Cookson
Screencast
Timestamps
Using summarise_at and starts_with functions to quickly sum up all variables starting with "tag_"
Using gather function (now pivot_longer) to convert topic tag variables from wide to tall (tidy) format
Explanation of using median (instead of mean) as measure of central tendency for number of claps an article got
Changing scale_x_continuous function's breaks argument to get custom labels and tick marks on a histogram
Discussion of using mean vs. median as measure of central tendency for reading time (he decides on mean)
Using unnest_tokens function from tidytext package to split character string into individual words
Explanation of stop words and using anti_join function from tidytext package to get rid of them
Quick analysis of which individual words are associated with more/fewer claps ("What are the hype words?")
Using geometric mean as alternative to median to get more distinction between words (note 27:33 where he makes a quick fix)
Finding correlations pairs of words using pairwise_cor function from widyr package
Filtering original data to only include words appear in the network plot (150 word pairs with most correlation)
Changing default colour scale to one with Blue = Low and High = Red with scale_colour_gradient2 function
Explanation of data format needed to conduct Lasso regression (and using cast_sparse function to get sparse matrix)
Using cv.glmnet function (cv = cross validated) from glmnet package to run Lasso regression