TV Golden Age
Data manipulation, Logistic regression
Notable topics: Data manipulation, Logistic regression
Recorded on: 2019-01-08
Timestamps by: Alex Cookson
Screencast
Timestamps
Investigating inconsistency of shows having a count of seasons that is different from the number of seasons given in the data
Using %in% operator and all function to only get shows that have a first season and don't have skipped seasons in the data
Using facet_wrap function to separate different shows on a line graph into multiple small graphs
Writing custom embedded function to get width of breaks on the x-axis to always be even (e.g., season 2, 4, 6, etc.)
Committing, finding, and explaining a common error of using the same variable name when summarizing multiple things
Using truncated division operator %/% to bin data into two-year bins instead of annual (e.g., 1990 and 1991 get binned to 1990)
Using subsetting (with square brackets) within the mutate function to calculate mean on only a subset of data (without needing to filter)
Using gather function (now pivot_longer) to get metrics as columns into tidy format, in order to graph them all at once with a facet_wrap
Using pmin function to lump all seasons after 4 into one row (it still shows "4", but it represents "4+")
Using paste0 and spread functions to get season 1-3 ratings into three columns, one for each season
Using distinct function with .keep_all argument remove duplicates by only keeping the first one that appears
Using logistic regression to answer, "Does season 1 rating affect the probability of getting a second season?" (note he forgets to specify the family argument, fixed at 57:25)
Using ntile function to divide data into N bins (5 in this case), then eventually using cut function instead
Using augment function as a method of visualizing and interpreting coefficients of regression model
Using crossing function to create new data to test the logistic regression model on and interpret model coefficients