Bob Ross Paintings
Network graphs, Principal Component Analysis (PCA)
Notable topics: Network graphs, Principal Component Analysis (PCA)
Recorded on: 2019-08-11
Timestamps by: Alex Cookson
Screencast
Timestamps
Using clean_names function in janitor package to get field names to snake_case
Using gather function to get wide elements into tall (tidy) format
Cleaning text (str_to_title, str_replace) to get into nicer-to-read format
Using str_remove_all function to trim trimming quotation marks and backslashes
Using extract function to extract the season number and episode number from episode field; uses regex capturing groups
Using add_count function's name argument to specify field's name
Getting into whether the elements of Ross's paintings changed over time (e.g., are mountains more/less common over time?)
Quick point: could have used logistic regression to see change over time of elements
Asking, "What elements tends to appear together?" prompting clustering analysis
Using pairwise_cor to see which elements tend to appear together
Discussion of a blind spot of pairwise correlation (high or perfect correlation on elements that only appear once or twice)
Asking, "What are clusters of elements that belong together?"
Creating network plot using ggraph and igraph packages
Reviewing network plot for interesting clusters (e.g., beach cluster, mountain cluster, structure cluster)
Explanation of Principal Component Analysis (PCA)
Start of actual PCA coding
Using acast function to create matrix of painting titles x painting elements (initially wrong, corrected at 36:30)
Centering the matrix data using t function (transpose of matrix), colSums function, and colMeans function
Using svd function to performn singular value decomposition, then tidying with broom package
Exploring one principal component to get a better feel for what PCA is doing
Using reorder_within function to re-order factors within a grouping
Exploring different matrix names in PCA (u, v, d)
Looking at top 6 principal components of painting elements
Showing percentage of variation that each principal component is responsible for