NCAA Women’s Basketball
Heatmap, Correlation analysis
Notable topics: Heatmap, Correlation analysis
Recorded on: 2020-10-05
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_relevel from the forcats package to order the factor levels for the tourney_finish variable.
Use geom_tile from the ggplot2 package to create a heatmap to show how far a particular seed ends up going in the tournament.
Use scale_y_continuous from the ggplot2 package with breaks = seq(1, 16) in order to include all 16 seeds.
Use geom_text from the ggplot2 package with label = percent(pct) to apply the percentage to each tile in the heatmap.
Use scale_x_discrete and scale_y_continuous both with expand = c(0, 0) to remove the space between the x and y axis and the heatmap tiles. David calls this flattening.
Use scale_y_reverse to flip the order of the y-axis from 1-16 to 16-1.
Use cor from the stats package to calculate the correlation between seed and tourney_finish. Then plotted to determine if there is a correlation over time.
Use geom_smooth with method = "loess" to add a smoothing line with confidence bound to aid in seeing the trend between seed and reg_percent.
Use fct_lump from the forcats package to lump together all the conference except for the n most frequent.
Use geom_jitter from the ggplot2 package instead of geom_boxplot to avoid overplotting which makes it easier to visualize the points that make up the distribution of the seed variable.
Use geom_smooth with method = "lm" to aid in seeing the trend between reg_percent and tourney_w.
Create a dot pipe function using . and %>% to avoid duplicating summary statistics with summarize.
Use glue from the glue package to concatenate together school and n_entries on the geo_col y-axis.
Summary of screencast.