Great American Beer Festival

Log odds ratio, Logistic regression, TIE Fighter plot

Published

October 19, 2020

Notable topics: Log odds ratio, Logistic regression, TIE Fighter plot

Recorded on: 2020-10-19

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

pivot_wider

tidyr

Use pivot_wider with values_fill = list(value =0)) from the tidyr package along with mutate(value = 1) to pivot the medal variable from long to wide adding a 1 for the medal type awarded and 0 for the remaining medal types in the row.

fct_lump

forcats

Use fct_lump from the forcats package to lump together all the beers except for the N most frequent.

str_to_upper

stringr

Use str_to_upper from the stringr package to convert the case of the state variable to uppercase.

fct_relevel

forcats

Use fct_relevel from the the forcats package in order to reorder the medal factor levels.

fct_reorder

forcats

Use fct_reorder from the forcats package to sort beer_name factor levels by sorting along n.

glue

Use glue from the glue package to concatenate beer_name and brewery on the y-axis.

fct_lump

forcats

Use ties.mthod = "first" within fct_lump to show only the first brewery when a tie exists between them.

state.abbsetdiff

datasets

Use setdiff from the dplyr package and the state.abb built in vector from the datasets package to check which states are missing from the dataset.

summarize

dplyr

Use summarize from the dplyr package to calculate the number of medals with n_medals = n(), number of beers with n_distinct, number of gold medals with sum(), and weighted medal totals using sum(as.integer() because medal is an ordered factor, so 1 for each bronze, 2 for each silver, and 3 for each gold.

read_csv

readr

Import Craft Beers Dataset from Kaggle using read_csv from the readr package.

inner_join

dplyr

Use inner_join from the dplyr package to join together the 2 datasets from kaggle.

semi_join

dplyr

Use semi_join from the dplyr package to join together to see if the beer names match with the kaggle dataset. Ends up at a dead end with not enough matches between the datasets.

bind_log_odds

tidylo

Use bind_log_odds from the tidylo package to show the representation of each beer category for each state compared to the categories across the other states.

complete

tidyr

Use complete from the tidyr package in order to turn missing values into explicit missing values.

reorder_withinscale_y_reorderedfacet_wrap

tidytext

Use reorder_within from the tidytext package and scale_y_reordered from the tidytext package in order to reorder the bars within each facet panel.

fct_reorder

forcats

Use fct_reorder from the forcats package to reorder the facet panels in descending order.

fill

ggplot2

For the previous plot, use fill = log_odds_weighted > 0 in the ggplot aes argument to highlight the positive and negative values.

add_countmutate

dplyr

Use add_count from the dplyr package to add a year_total variable which shows the total awards for each year. Then use this to calculate the percent change in totals medals per state using mutate(pct_year = n / year)

glmcbind

stats

Use glm from the stats package to create a logistic regression model to find out if their is a statistical trend in the probability of award success over time.

group_bysummarizelistglmmutatemap

broompurrr

Exapnd on the previous model by using the broom package to fit multiple logistic regressions across multiple states instead of doing it for an individual state at a time.

conf.int

Use conf.int = TRUE to add confidence bounds to the logistic regression output then use it to create a TIE Fighter plot to show which states become more or less frequent medal winners over time.

state.namematch

datasets

Use the state.name dataset with match from base r to change state abbreviation to the state name.

Summary of screencast.

Screencast

Timestamps

0:8:20

0:11:25

0:12:25

0:12:25

0:13:25

0:14:30

0:15:00

0:19:25

0:21:25

0:26:05

0:28:00

0:29:40

0:33:05

0:33:35

0:35:30

0:36:40

0:39:35

0:41:45

0:44:40

0:47:15

0:50:25

0:53:00

0:55:00