IKEA Furniture
Linear model, Coefficient/TIE fighter plot, Boxplots, Log scale discussion, Calculating volume
Notable topics: Linear model, Coefficient/TIE fighter plot, Boxplots, Log scale discussion, Calculating volume
Recorded on: 2020-11-02
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_reorder
from the forcats
package to reorder the factor levels for category
sorted along n
.
Brief explanation of why scale_x_log10
is needed given the distribution of category
and price
with geom_boxplot
.
Using geom_jitter
with geom_boxplot
to show how many items are within each category
.
Use add_count
from the dplyr
package and glue
from the glue
package to concatenate the category
name with category_total
on the geom_boxplot
y-axis.
Convert from Saudi Riyals
to United States Dollars
.
Create a ridgeplot
- AKA joyplot
- using ggridges
package showing the distribution of price
across category
.
Discussion on distributions
and when to use a log scale
.
Use fct_lump
from the forcats
package to lump together all the levels in category
except for the n
most frequent.
Use scale_fill_discrete
from the ggplot2
package with guide = guide_legend(reverse = TRUE)
to reverse the fill legend
.
Use str_trim
from the stringr
package to remove whitespace from the short_description
variable. David then decides to use str_replace_all
instead with the following regular expression "\\s+", " "
to replace all whitespace with a single space instead.
Use separate
from the tidyr
package with extra = "merge"
and fill = "right"
to separate item description from item dimension.
Use extract
from the tidyr
package with the regular expression "[\\d\\-xX]+) cm"
to extract the numbers before cm
.
Use unite
from the tidyr
package to paste together the category
and main_description
columns into a new column named category_and_description
.
Calculate the volume given the depth
, height
, and width
of each item in dataset in liters using depth * height * width / 1000
. At 36:15, David decides to change to cubic meters
instead using depth * height * width / 1000000
.
Use str_squish
from the stringr
package to remove whitespace from the start to the end of the short_description
variable.
Use lm
from the stats
package to create a linear model on a log, log scale
to predict the price of an item based on volume + category. David then uses fct_relevel
to reorder the factor levels for category
such that tables & desks
is first (starting point) since it's the most frequent item in the category variable and it's price distribution is in the middle.
Use the broom
package to turn the model output into a coefficient / TIE fighter plot.
Use str_remove
from the stringr
package to remove category
from the start of the strings on the y-axis using the regular expression "^category"
Summary of screencast.