Introduction to R Packages for Data Management: tidyr and dplyr
Dawn Koffman is a Statistical Programmer at the Office of Population Research at Princeton University. She earned an MS in Computer Science from University of Wisconsin-Madison, and an MPH in Epidemiology and Biostatistics from UMDNJ and Rutgers University.
This workshop introduces two modern R packages, both written by Hadley Wickham, that provide intuitive tools for handling common data management tasks. The first package, tidyr, provides functions that reshape data so it conforms to a specific “tidy” structure where each variable is saved in its own column, each observations is saved in its own row, and each type of observational unit is stored in a separate table. The second package, dplyr, provides a set of functions (referred to as “verbs”) that allow you to easily subset observations, re-order observations, select specific variables, add new variables, group observations, and summarize groups of observations.