Dolphins
Survival analysis
Notable topics: Survival analysis
Recorded on: 2018-12-17
Timestamps by: Alex Cookson
Screencast
Timestamps
Using year function from lubridate package to simplify calculating age of dolphins
Combining count and fct_lump functions to get counts of top 5 species (with other species lumped in "Other")
Creating boxplot of species and age
Dealing with different types of NA (double, logical) (he doesn't get it in this case, but it's still useful)
Adding acquisition type as colour dimension to histogram
Creating a spinogram of acquisition type over time (alternative to histogram) using geom_area
Binning year into decade using truncated division operator %/%
Fixing annoying triangular gaps in spinogram using complete function to fill in gaps in data
Using fct_reorder function to reorder acquisition type (bigger categories are placed on the bottom of the spinogram)
Adding vertical dashed reference line using geom_vline function
Starting analysis of acquisition location
Matching messy text data with regex to aggregate into a few categories variables with fuzzyjoin package
Using distinct function's .keep_all argument to keep only one row per animal ID
Using coalesce function to conditionally replace NAs (same functionality as SQL verb)
Starting survival analysis
Using survfit function from survival package to get a baseline survival curve (i.e., not regressed on any independent variables)
Fixing cases where death year is before birth year
Fixing specification of survfit model to better fit the format of our data (right-censored data)
Built-in plot of baseline survival model (estimation of percentage survival at a given age)
Using broom package to tidy the survival model data (which is better for ggplot2 plotting)
Fitting survival curve based on sex
Cox proportional hazards model (to investigate association of survival time and one or more predictors)
Explanation of why dolphins with unknown sex likely have a systematic bias with their data
Investigating whether being born in captivity is associated with different survival rates
Summary of screencast