R Data Wrangling: tidyverse packages tidyr & dplyr
Dawn Koffman is a Statistical Programmer at the Office of Population Research at Princeton University. She earned an MS in Computer Science from University of Wisconsin-Madison, and an MPH in Epidemiology and Biostatistics from UMDNJ and Rutgers University.
This workshop, presented as part of Princeton's Research Computing Winter 2021 Bootcamp, introduces two modern R packages, both written by Hadley Wickham and part of R’s “tidyverse,” that provide intuitive tools for handling common data management tasks. The first package, tidyr, provides functions that reshape data so it conforms to a specific “tidy” structure where each variable is saved in its own column, each observations is saved in its own row, and each type of observational unit is stored in a separate table. The second package, dplyr, provides a set of functions (referred to as “verbs”) that allow you to easily subset observations, re-order observations, select specific variables, add new variables, group observations, and summarize groups of observations.
Presentation, demo and hands-on.
Attendees should have R, RStudio and the R packages tidyr and dplyr installed on their machines prior to the workshop.
tidyr presentation (PDF)
dplyr script (ZIP)
dplyr presentation (PDF)