Tennis Tournaments
NA
Notable topics: NA
Recorded on: 2019-04-08
Timestamps by: Alex Cookson
Screencast
Timestamps
Identifying duplicated rows ands fixing them
Using add_count and fct_reorder functions to order categories that are broken down into sub-categories for graphing
Tidying graph titles (e.g., replacing underscores with spaces) using str_to_title and str_replace functions
Using inner_join function to merge datasets
Calculating age from date of birth using difftime and as.numeric functions
Adding simple calculations like mean and median into the text portion of markdown document
Looking at distribution of wins by sex using overlapping histograms
Binning years into decades using truncated division %/%
Splitting up boxplots so that they are separated into pairs (M/F) across a different group (decade) using interaction function
Analyzing distribution of ages across decades, looking specifically at the effect of Serena Williams (one individual having a disproportionate affect on the data, making it look like there's a trend)
Avoiding double-counting of individuals by counting their average age instead of their age at each win
Starting analysis to predict winner of Grand Slam tournaments
Creating rolling count using row_number function to make a count of previous tournament experience
Creating rolling win count using cumsum function
Lagging rolling win count using lag function (otherwise we get information about a win before a player has actually won, for prediction purposes)
Asking, "When someone is a finalist, what is their probability of winning as a function of previous tournaments won?"
Asking, "How does the number of wins a finalist has affect their chance of winning?"
Backtesting simple classifier where person with more tournament wins is predicted to win the given tournament
Creating classifier that gives points based on how far a player got in previous tournaments
Using match function to turn name of round reached (1st round, 2nd round, …) into a number score (1, 2, …)
Using cummean function to get score of average past performance (instead of cumsum function)
Pulling names of rounds (1st round, 2nd round, … ) based on the rounded numeric score of previous performance