R Downloads
Data manipulation (especially time series)
Notable topics: Data manipulation (especially time series)
Recorded on: 2018-10-29
Timestamps by: Alex Cookson
Screencast
Timestamps
Using geom_line function to visualize changes over time
Starting to decompose time series data into day-of-week trend and overall trend (lots of lubridate package functions)
Using floor_date function from lubridate package to round dates down to the week level
Using min function to drop incomplete/partial week at the start of the dataset
Using countrycode function from countrycode package to replace two-letter country codes with full names (e.g., "CA" becomes "Canada")
Using fct_lump function to get top N categories within a categorical variable and classify the rest as "Other"
Using hour function from lubridate package to pull out integer hour value from a datetime variable
Using facet_wrap function to graph small multiples of downloads by country, then changing scales argument to allow different scales on y-axis
Starting analysis of downloads by IP address
Using as.POSIXlt to combine separate date and time variables to get a single datetime variable
Using lag function to calculate time between downloads (time between events) per IP address (comparable to SQL window function)
Using as.numeric function to convert variable from a time interval object to a numeric variable (number in seconds)
Explanation of a bimodal log-normal distribution
Handy trick for setting easy-to-interpret intervals for time data on scale_x_log10 function's breaks argument
Starting to explore package downloads
Adding 1 to the numerator and denominator when calculating a ratio to get around dividing by zero
Showing how to look at package download data over time using cran_downloads function from the cranlogs package