Introduction to an R Package for Text Analysis: stm

Brandon Stewart is an Assistant Professor of Sociology at Princeton University where he is also affiliated with the Politics Department, the Office of Population Research and the Center for the Digital Humanities. He develops new quantitative statistical methods for applications across computational social science, and is an author of several R packages, including stm, an R package that provides text analysis tools for working within the general framework defined by the Structural Topic Model. Brandon Stewart earned his PhD in Government at Harvard in 2015 and a Master’s degree in Statistics in 2014, also at Harvard.
5/03/2016 from 9:30 AM to 12:00 PM ~ Wallace 300

The Structural Topic Model is a general framework for topic modeling with document-level covariate information. The covariates can improve inference and qualitative interpretability and are allowed to affect topical prevalence, topical content or both. The software package implements the estimation algorithms for the model and also includes tools for every stage of a standard workflow from reading in and processing raw text through making publication quality figures.  The workshop will provide a hands-on introduction to using the stm package which currently includes functionality to:

  •  ingest and manipulate text data
  •  estimate Structural Topic Models
  • calculate covariate effects on latent topics with uncertainty
  • estimate a graph of topic correlations
  • compute model diagnostics and summary measures
  • create the plots used in various papers about stm

Attendees should have previous R experience.
Lecture, discussion and hands-on exercises.
Attendees should bring a laptop with R and the R package stm already installed.  The stm package is available on CRAN and can be installed using: install.packages("stm")