Animal Crossing
Topic modelling (stm package)
Notable topics: Topic modelling (stm package)
Recorded on: 2020-05-04
Timestamps by: Alex Cookson
Screencast
Timestamps
Starting text analysis of critic reviews of Animal Crossing
Using floor_date function from lubridate package to round dates down to nearest month (then week)
Using unnest_tokens function and anti_join functions from tidytext package to break reviews into individual words and remove stop words
Taking the average rating associated with individual words (simple approach to gauge sentiment)
Using geom_line and geom_point to graph ratings over time
Using mean function and logical statement to calculate percentages that meet a certain condition
Using geom_text to visualize what words are associated with positive/negative reviews
Disclaimer that this exploration is not text regression -- wine ratings screencast is a good resource for that
Starting to do topic modelling
Explanation of stm function from stm package
Explanation of stm function's output (topic modelling output)
Changing the number of topics from 4 to 6
Explanation of how topic modelling works conceptually
Using tidy function from broom package to find which "documents" (reviews) were the "strongest" representation of each topic
Noting that there might be a scraping issue resulting in review text being repeated
(Unsuccessfully) Using str_sub function to help fix repeated review text by locating where in the review text starts being repeated
(Unsuccessfully) Using str_replace and map2_chr functions, as well as regex cpaturing groups to fix repeated text
Looking at the association between review grade and gamma of the topic model (how "strong" a review represents a topic)
Using cor function with method = "spearman" to calculate correlation based on rank instead of actual values
Summary of screencast