Great American Beer Festival
Log odds ratio, Logistic regression, TIE Fighter plot
Notable topics: Log odds ratio, Logistic regression, TIE Fighter plot
Recorded on: 2020-10-19
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use pivot_wider
with values_fill = list(value =0))
from the tidyr
package along with mutate(value = 1)
to pivot the medal
variable from long
to wide
adding a 1 for the medal type awarded and 0 for the remaining medal types in the row.
Use fct_lump
from the forcats
package to lump together all the beers except for the N most frequent.
Use str_to_upper
from the stringr
package to convert the case of the state
variable to uppercase.
Use fct_relevel
from the the forcats
package in order to reorder the medal
factor levels.
Use fct_reorder
from the forcats
package to sort beer_name
factor levels by sorting along n
.
Use glue
from the glue
package to concatenate beer_name
and brewery
on the y-axis.
Use ties.mthod = "first"
within fct_lump
to show only the first brewery
when a tie exists between them.
Use setdiff
from the dplyr
package and the state.abb
built in vector from the datasets
package to check which states are missing from the dataset.
Use summarize
from the dplyr
package to calculate the number of medals
with n_medals = n()
, number of beers
with n_distinct
, number of gold medals
with sum()
, and weighted medal totals
using sum(as.integer()
because medal
is an ordered factor, so 1 for each bronze, 2 for each silver, and 3 for each gold.
Import Craft Beers Dataset
from Kaggle
using read_csv
from the readr
package.
Use inner_join
from the dplyr
package to join together the 2 datasets from kaggle
.
Use semi_join
from the dplyr
package to join together to see if the beer names match with the kaggle
dataset. Ends up at a dead end with not enough matches between the datasets.
Use bind_log_odds
from the tidylo
package to show the representation of each beer category for each state compared to the categories across the other states.
Use complete
from the tidyr
package in order to turn missing values into explicit missing values.
Use reorder_within
from the tidytext
package and scale_y_reordered
from the tidytext
package in order to reorder the bars within each facet panel.
Use fct_reorder
from the forcats
package to reorder the facet panels
in descending order.
For the previous plot, use fill = log_odds_weighted > 0
in the ggplot
aes
argument to highlight the positive and negative values.
Use add_count
from the dplyr
package to add a year_total
variable which shows the total awards for each year. Then use this to calculate the percent change in totals medals per state using mutate(pct_year = n / year)
Use glm
from the stats
package to create a logistic regression
model to find out if their is a statistical trend in the probability of award success over time.
Exapnd on the previous model by using the broom
package to fit multiple logistic regressions
across multiple states instead of doing it for an individual state at a time.
Use conf.int = TRUE
to add confidence bounds
to the logistic regression
output then use it to create a TIE Fighter
plot to show which states become more or less frequent medal winners over time.
Use the state.name
dataset with match
from base r
to change state abbreviation to the state name.
Summary of screencast.