GDPR Violations
Data manipulation, Interactive dashboard with shinymetrics
and tidymetrics
Notable topics: Data manipulation, Interactive dashboard with shinymetrics
and tidymetrics
Recorded on: 2020-04-20
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use the mdy
function from the lubridate
package to change the date variable from character
class to date
class.
Use the rename
function from the dplyr
package to rename variable in the dataset.
Use the fct_reorder
function from the forcats
package to sort the geom_col
in descending order.
Use the fct_lump
function from the forcats
package within count
to lump together country names except for the 6 most frequent.
Use the scale_x_continuous
function from ggplot2
with the scales
package to change the x-axis values to dollar format.
Use the month
and floor_date
function from the lubridate
package to get the month component from the date
variable to count the total fines per month.
Use the na_if
function from the dplyr
package to convert specific date value to NA
.
Use the fct_reorder
function from the forcats
package to sort the stacked geom_col
and legend labels in descending order.
Use the dollar
function from the scales
package to convert the price
variable into dollar format.
Use the str_trunc
to shorten the summary
string values to 140 characters.
Use the separate_rows
function from the tidyr
package with a regular expression
to separate the values in the article_violated
variable with each matching group placed in its own row.
Use the extract
function from the tidyr
package with a regular expression
to turn each matching group into a new column.
Use the geom_jitter
function from the ggplot2
package to add points to the horizontal box plot.
Use the inner_join
function from the dplyr
package to join together article_titles
and separated_articles
tables.
Use the paste0
function from base R
to concatenate article
and article_title
.
Use the str_detect
function from the stringr
package to detect the presence of a pattern in a string.
Use the group_by
and summarize
functions from the dplyr
package to aggregate fines that were issued to the same country on the same day allowing for size to be used in geom_point
plot.
Use the scale_size_continuous
function from the ggplot2
package to remove the size legend.
Create an interactive dashboard using the shinymetrics
and tidymetrics
which is a tidy approach to business intelligence.
Use the cross_by_dimensions
and cross_by_periods
functions from the tidyr
package which stacks an extra copy of the table for each dimension specified as an argument (country
, article_title
, type
), replaces the value of the column with the word All
and periods
, and groups by all the columns. It acts as an extended group_by that allows complete summaries across each individual dimension and possible combinations.