NCAA Women’s Basketball
Heatmap, Correlation analysis
Notable topics: Heatmap, Correlation analysis
Recorded on: 2020-10-05
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_relevel
from the forcats
package to order the factor levels for the tourney_finish
variable.
Use geom_tile
from the ggplot2
package to create a heatmap
to show how far a particular seed ends up going in the tournament.
Use scale_y_continuous
from the ggplot2
package with breaks = seq(1, 16)
in order to include all 16 seeds.
Use geom_text
from the ggplot2
package with label = percent(pct)
to apply the percentage to each tile in the heatmap.
Use scale_x_discrete
and scale_y_continuous
both with expand = c(0, 0)
to remove the space between the x and y axis and the heatmap tiles. David calls this flattening.
Use scale_y_reverse
to flip the order of the y-axis from 1-16 to 16-1.
Use cor
from the stats
package to calculate the correlation
between seed
and tourney_finish
. Then plotted to determine if there is a correlation over time.
Use geom_smooth
with method = "loess"
to add a smoothing line with confidence bound to aid in seeing the trend between seed
and reg_percent
.
Use fct_lump
from the forcats
package to lump together all the conference except for the n
most frequent.
Use geom_jitter
from the ggplot2
package instead of geom_boxplot
to avoid overplotting which makes it easier to visualize the points that make up the distribution of the seed
variable.
Use geom_smooth
with method = "lm"
to aid in seeing the trend between reg_percent
and tourney_w
.
Create a dot pipe function
using .
and %>%
to avoid duplicating summary statistics with summarize
.
Use glue
from the glue
package to concatenate together school
and n_entries
on the geo_col
y-axis.
Summary of screencast.