OPR Data Archive

Welcome to OPR's Data Archive. This page provides a quick overview of our holdings and links to other data resources including joint projects hosted elsewhere.

New Data Release

The Mexican Migration Project (MMP), an ongoing multidisciplinary study of migration from Mexico to the United States, has released data for 150 communities (MMP150), which includes the original 143 communities plus 9 new additional communities: five from the state of Queretaro and four from the state of Tabasco. The MMP150 has information on 24,032 Mexican households, 957 U.S. households, and individual-level data on 157,879 persons. These data contain information on 8,052 household heads with migration experience to the U.S. and information on 49 household heads with Canadian migration experience.

Data are available in three system formats: SAS, SPSS, STATA, and CSV. This project requires registration to download data. Users of the previous data are encouraged to download the new dataset MMP150.

The Wiki Surveys: Open and Quantifiable Social Data Collection project seeks to develop new online methods of social data collection. This data and code release enables others to replicate and extend the results in Salganik and Levy (2015).

The data release includes: 6 raw data csv files, 6 pre-cleaned csv files, 4 csv files related to data cleaning, 6 R program code files, 1 bash script, and 1 documentation file in PDF format.

The Monitoring Mt. Laurel Study aims to provide information on the experiences of residents of the Ethel Lawrence Homes (ELH), a mixed income affordable housing development in Mt. Laurel, NJ, vis-à-vis a comparison sample of non-residents. It also surveys residents of the town about their opinions on affordable housing and how the construction of the Ethel Lawrence Homes has impacted the community.

The Adult head of household, Child, and Neighbor sample datasets are now available on a restricted basis for researchers who agree to the terms specified in the data agreement. Please contact Prof. Massey, if you have questions regarding the data.

The Fragile Families and Child Wellbeing Study, being conducted by the Bendheim-Thoman Center for Research on Child Wellbeing (CRCW) has now released the national and city-level weights for mothers and fathers. These weights are for the Nine-Year Follow Up (Wave 5) public use datafiles.

The Fragile Families and Child Wellbeing Study follows a cohort of nearly 5,000 children born in the U.S. between 1998 and 2000. The study oversamples births to unmarried couples; and, when weighted, the data are representative of births in large U.S. cities at the turn of the century.

The Latin American Migration Project (LAMP), which extends the MMP design to a study of migration flows originating in other Latin American countries, has released data for the four communities of Ecuador (LAMP-ECU4).

LAMP-ECU4 provides information on 4 communities, 803 households, and 4,732 people, including data from Puerto Rico, the Dominican Republic, El Salvador, Nicaragua, Costa Rica, Paraguay, Peru, Haiti, Colombia, and Eduador. Registration is required for accessing the data.

The Network Scale-up Method for Heavy Drug Users study (NSUM) was conducted to evaluate the method for estimating the sizes of groups most at-risk for HIV/AIDS. Using four different data sourses, the authors produced five estimates of the number of heavy drug users in Curitiba, Brazil. This data release include three data files in Comma Separated Values (CSV) format and three R programs, a documentation file for the data and R code, and the questionnaire instruments (in Portuguese) as Portable Document Format (PDF) files.

The Survey of Unemployed Workers in New Jersey (Krueger and Mueller, 2011) invited unemployed workers to participate in the study each week for up to 12 weeks (and additional 12 weeks for some). The released two data files: (1) The Entry Survey Public Use Data file, which has demographic, income and wealth information on 6,025 unemployed workers sampled from the universe of the roughly 360,000 individuals receiving Unemployment Insurance (UI) benefits in New Jersey as of September 28, 2009; and (2) The Weekly Follow-up Survey Public Use Data file, which contains focused information on the job search activities, reservation wage, and receipt of job offers. There are overall 39,201 person-week observations in the Weekly data.

The Game of Contacts (GC) data were collected as nested items in a behavioral surveillance study of heavy drug users in Curitiba, Brazil. This public data release includes two data files on 294 respondents in comma separated values (CSV) format, 13 R programs, data documentation, and a copy of the interviewer form (in Portuguese) to record the game of contacts data. By running the R programs, one can reproduce all the graphical and tabular results as reported in Salganik et. al. (2010) "The Game of Contacts: Estimating the Social Visibility of Groups." Social Networks (2010).

The Addis Ababa Mortality Surveillance Project (AAMSP) is hosted by Addis Ababa University and revolves around surveillance of burials at all known cemeteries of Addis Ababa, Ethiopia. This release of the public use data pertains to the first five years of the burial surveillance starting 2001 and includes a set of adult verbal autopsy interviews conducted in 2004. Registration is required for accessing the data.

Project90 was a prospective study of the influence of network structure on the dynamics of HIV transmission in a community of high-risk heterosexuals. The data was collected between 1988 and 1992 in Colorado Springs, CO. Stephen Muth and John Potterat kindly provided the data to Sharad Goel and Matthew Salganik in 2007, and it was later used in their paper, S. Goel and M. J. Salganik (2010) "Assessing respondent-driven sampling" Proceedings of the National Academy of Sciences (PNAS). The release of these data allows others to replicate the analyses of Goel and Salganik.

The Immigrant Identity Project (IIP), also known as Transnational Identities and behavior: An Ethnographic Comparison of First and Second Generation Latino Immigrants, released data for public use, which include: a quantitative data sheet, 165 interview transcripts (personal and place names are masked), and 306 pictures taken by respondents themselves related to American/Latino identity. Registration is required for accessing the data.

The Texas Higher Education Opportunity Project (THEOP) is a multi-year study that investigates college planning and enrollment behavior under a policy that guarantees admission to any Texas public college or university to high school seniors who graduate in the top decile of their class. THEOP released administrative data that consists of College Application Data and College Transcript Data obtained from nine Texas universities in December 2008. It also released the Sophomore Cohort Wave 2 Survey Data in Feb 2009.

The Success and Failure in Cultural Markets project (CM) was motivated by puzzling aspects of contemporary cultural markets, released data from a series of four web-based experiments involving a total of 27,267 participants. Included in this release are 167 data files, 48 music files (mp3 format), and detailed documentation. The experiments were conducted by Prof. Matthew J. Salganik between 2004 and 2007.

The National Longitudinal Survey of Freshmen (NLSF) has released the wave 4 (Junior in Spring 2002) and wave 5 (Senior in Spring 2003) public use datasets. Information on participants' graduation from college is available in a separate graduation dataset. The two final waves contain similar information as wave 2 (Freshman in Spring 2000) and wave 3 (Sophomore in Spring 2001), as well as detailed information on extracurricular group involvement, health and emotional problems, college debts, future plans for employment, career and higher education, respondents' perception of their own/other racial and ethnic groups in terms of identity, incidences of discrimination and prejudice to name a few. The NLSF follows a cohort of first-time freshman at selective colleges and universities through their college careers. Equal numbers of whites, blacks, Hispanics, and Asians were sampled at each of the 28 participating schools, with nearly 4,000 respondents.

The New Immigrant Survey(NIS) is a panel survey of a nationally representative sample of new legal immigrants to the United States. The first full cohort (NIS2003-1) data are now available for download.

The Social Environment and Biomarkers of Aging Study (SEBAS) is an unusually rich, population-based longitudinal study focusing on the health and well-being of older persons in Taiwan. SEBAS explores the relationship between life challenges and mental and physical health, the impact of social environment on the health and well-being of the elderly, and biological markers of health and stress. For more information about SEBAS, a joint project of Georgetown University's Center for Population and Health (CPH) and OPR. Public use data from the project are available at ICPSR under study 3792.

Following are historic datasets archived at OPR. When the data are officially disseminated by others then the OPR's copy is for internal use only.

Datasets connected with the Princeton European Fertility Project, including the famous Hutterite fertility data first analyzed by Mindel Sheps and later used to establish standards for the analysis of the European fertility decline.

U.S. Cohort and Period Fertility Tables 1917-1980, produced by the National Institute of Child Health and Development, National Institutes of Health, compiled by Robert L. Heuser.

Population and death statistics tables from developing countries amassed by the Organisation of Economic Co-operation and Development (OECD).

The World Fertility Survey (WFS), a collection of high-quality, internationally comparable surveys of human fertility conducted in 41 developing countries in the late seventies and early eighties.

