Cocktails
Pairwise correlation, Network diagram, Principal component analysis (PCA)
Notable topics: Pairwise correlation, Network diagram, Principal component analysis (PCA)
Recorded on: 2020-05-25
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_reorder
from the forcats
package to reorder the ingredient
factor levels along n
.
Use fct_lump
from the forcats
package to lump together all the levels except the n
most frequent in the category
and ingredient
variables.
Use pairwise_cor
from the widyr
package to find the correlation between the ingredients
.
Use reorder_within
from the tidytext
package with scale_x_reordered
to reorder the the columns in each facet
.
Use the ggraph
and igraph
packages to create a network diagram
Use extract
from the tidyr
package with regex = (.*) oz
to create a new variable amount
which doesn't include the oz
.
Use extract
with regex
to turn the strings in the new amount
variable into separate columns for the ones
, numerator
, and denominator
.
Use replace_na
from the tidyr
package to replace NA
with zeros in the ones
, numberator
, and denominator
columns. David ends up reaplcing the zero
in the denominator
column with ones in order for the calculation to work.
Use geom_text_repel
from the ggrepel
package to add ingredient
labels to the geom_point
plot.
Use na_if
from the dplyr
package to replace zeros
with NA
Use scale_size_continuous
with labels = percent_format()
to convert size legend values to percent.
Change the size of the points in the network diagram
proportional to n
using vertices = ingredient_info
within graph_from_data_frame
and aes(size = n)
within geom_node_point
.
Use widely_svd
from the widyr
package to perform principle component analysis on the ingredients
.
Use paste0
to concatenate PC
and dimension
in the facet panel titles.
Summary of screencast.