Data and Methods

Jeanne Altmann • Elizabeth Armstrong • Susan Fiske • Margaret Frye • Noreen Goldman • Bryan Grenfell • Douglas S. Massey • C. Jessica E. Metcalf • Elizabeth Levy Paluck • Germán Rodríguez • Marta Tienda • James Trussell • Yu Xie


Jeanne Altmann collaborated on “Female and Male Life Tables for Seven Wild Primate Species,” published in Scientific Data, with Bronikowski, A. (Iowa State University) et al. The authors provide male and female census count data, age-specific survivorship, and female age-specific fertility estimates for populations of seven wild primates that have been continuously monitored for at least 29 years: sifaka (Propithecus verreauxi) in Madagascar; muriqui (Brachyteles hypoxanthus) in Brazil; capuchin (Cebus capucinus) in Costa Rica; baboon (Papio cynocephalus) and blue monkey (Cercopithecus mitis) in Kenya; chimpanzee (Pan troglodytes) in Tanzania; and gorilla (Gorilla beringei) in Rwanda. Using one-year age-class intervals, they computed point estimates of age-specific survival for both sexes. In all species, their survival estimates for the dispersing sex are affected by heavy censoring. They also calculated reproductive value, life expectancy, and mortality hazards for females. They used bootstrapping to place confidence intervals on life-table summary metrics (R0, the net reproductive rate; ?, the population growth rate; and G, the generation time). These data have high potential for reuse; they derive from continuous population monitoring of long-lived organisms and will be invaluable for addressing questions about comparative demography, primate conservation and human evolution.


Elizabeth Armstrong and Miranda Waggoner (Florida State University) continue to work on their book manuscript that examines the uses of data from the Dutch Hunger Winter. During the winter of 1944-45, Nazi forces occupied the western provinces of the Netherlands, cutting off food and fuel shipments to the area. A severe famine ensued, which came to be known as the Dutch Hunger Winter, affecting some 4-5 million people. The health consequences of the famine have been extensively studied; in particular, data on the effects of exposure to famine in utero collected through the Dutch Famine Birth Cohort Study have become paradigmatic within epidemiology and in the emerging field of epigenetics. In addition, these data have been discussed extensively in the obstetric literature, the popular press, and increasingly, in social sciences like economics. This project examines patterns of dissemination and interpretation of evidence from the Dutch Hunger Winter through time and disciplinary space.


Matthew Desmond and Carl Gershenson (Washington University) published, “Who Gets Evicted? Assessing Individual, Neighborhood, and Network Factors,” in Social Science Research. The prevalence and consequences of eviction have transformed the lived experience of urban poverty in America, yet little is known about why some families avoid eviction while others do not. Applying discrete hazard models to a unique dataset of renters, this study empirically evaluates individual, neighborhood, and social network characteristics that explain disparities in displacement from housing. Family size, job loss, neighborhood crime and eviction rates, and network disadvantage are identified as significant and robust predictors of eviction, net of missed rental payments and other relevant factors. This study advances urban sociology and inequality research and informs policy interventions designed to prevent eviction and stem its consequences.


Edward Burkley (Oklahoma State University), Federica Durante (University of Milano-Bicocca), Susan Fiske, Melissa Burkley (Oklahoma State University), and Angela Andrade (University of Arkansas, Pine Bluff), published “Structure and Content of Native American Stereotypic Subgroups: Not just (Ig)noble, in Cultural Diversity and Ethnic Minority Psychology. Prejudice against Native Americans as an overall group generally polarizes into positive and negative stereotypic extremes, but distinct subgroups may explain this variability. Using college student samples (Study 1), a preliminary study identified common Native American subgroups and then a main study (N = 153, 74% women, 73% White, mean age = 19 years) had participants rate these subgroups on basic dimensions of stereotype content (i.e., warmth and competence), elicited emotions (e.g., admiration, contempt), and elicited behaviors (e.g., facilitation, harm). In Study 2, these preliminary study and main study procedures were replicated using nationwide samples (main study: N = 139, 51% women, 78% White, mean age = 35 years).

For the most part, similar Native American subgroups emerged in both samples. Using the stereotype content model (SCM); (Fiske, Cuddy, Glick, and Xu, 2002), the subgroups were found to vary along a competence-by-warmth space. The majority of subgroups (e.g., alcoholics, lazy) were judged low in both competence and warmth. Additional subgroups (e.g., casino operators, warriors) were ambivalently judged as high on competence but low on warmth. Subgroups perceived as high in both competence and warmth elicited more admiration, those low in both competence and warmth elicited more contempt, those high in competence elicited more passive facilitation and less passive harm, and those high in warmth elicited more active facilitation and less active harm. Native American stereotypes are apparently characterized by both noble and ignoble subgroups, highlighting the importance of studying stereotypes at the subgroup level.


Gandalf Nicolas, Malena de la Fuente, and Susan Fiske’s article, “Mind the Overlap in Categorization: A Review of Crossed-Categorization, Intersectionality, and Multiracial Perception,” was published in Group Processes and Intergroup Relations by Sage Journals. Research on social categorization continues, with one growth area being multiple categorization. Various approaches study questions that, although different in scope and content, potentially tap the same underlying processes. Current models that aim to understand judgments about targets who belong to multiple social groups include algebraic and non-algebraic models of crossed categorization, as well as theories related to intersectionality and multiracial categorization. The literature on these models and theories highlights some strengths and limitations. The review discusses potential overlap between models that have mostly advanced independently of each other. Future research can take a more encompassing stance to acknowledge this overlap.


Parijat Chakrabarti and Margaret Frye collaborated on, “The Promise of Quantitative Text Analysis for Demography,” published in Demographic Research. This paper explores the advantages of applying computational text analysis to qualitative data in demography. It begins by examining three particular issues that demographers often face in analyzing qualitative data—large samples, the challenge of comparing qualitative data across external categories, and connecting micro-level analysis to macro-patterns in the data—and discusses ways that new tools from machine learning and computer science might help to address these issues. Three applications of text analysis are described using a set of conversational journals about HIV/AIDS from Malawi. These applications vary in the extent to which computational techniques either supplement or supplant more traditional methods of qualitative data analysis. In the first example, computational techniques are used for topic exploration and sample selection; in the second, to analyze particular themes by gender and over time; and in the third, to demonstrate ways in which a mixed-methods approach can increase the analytic potential of qualitative data.


Hiram Beltrán-Sánchez (University of Wisconsin, Madison), Anne Pebley (University of California, Los Angeles), and Noreen Goldman published their joint research paper entitled, “Links between Primary Occupation and Functional Limitations among Older Adults in Mexico,” in Social Science and Medicine - Population Health. Finding that social inequalities in health and disability are often attributed to differences in childhood adversity, access to care, health behavior, residential environments, stress, and the psychosocial aspects of work environments. Yet, disadvantaged people are also more likely to hold jobs requiring heavy physical labor, repetitive movement, ergonomic strain, and safety hazards. They investigated the role of physical work conditions in contributing to social inequality in mobility among older adults in Mexico, using data from the Mexican Health and Aging Survey (MHAS) and an innovative statistical modeling approach. Using data on categories of primary adult occupation to serve as proxies for jobs with more or less demanding physical work requirements. Their results show that more physically demanding jobs are associated with mobility limitations at older ages, even when controlled for age and sex. Inclusion of job categories attenuates the effects of education and wealth on mobility limitations, suggesting that physical work conditions account for at least part of the socioeconomic differentials in mobility limitations in Mexico.


Douglas Massey published, “How Rising Minority Income Does (or Does Not) Lead to Residential Integration in the US,” in Atlas of Science. This paper examines how access to integrated neighborhoods changed for Asians, Hispanics, and African Americans from 1970 to 2010. Data come from the Decennial U.S. Census, with neighborhoods defined using census tracts, which are small geographic units established by census officials in collaboration with local authorities. For their analysis, they created comparable census tract grids for 287 consistently defined metropolitan from 1970 through 2010.


C. Jessica Metcalf, Michael J. Mina (Brigham & Women’s Hospital, Harvard University), Amy Winter, and Bryan Grenfell wrote, “Opportunities and Challenges of a World Serum Bank – Author’s reply,” published in The Lancet. In response to their Viewpoint proposing a World Serum Bank, Coates delineates how this data might be used to probe a major contemporary public health question: how climate change will affect the burden of infection. De Lusignan and Correa propose a pragmatic resource for initiating such a bank—ie, deploying primary care sentinel networks—in conjunction with public health entities. Both raise excellent points. They agree in particular that the issue of consent is key; and that diverse sources of data should be leveraged. However, they also offer a caution that convenience samples could fall short in various ways from the ideal sample (representative of the general population, and adequately powered), and therefore require careful analytical handling. As a result, an array of mathematical and statistical analyses will be necessary to understand the nuance of biases resulting from the inevitable dependence on diverse data, obtained for various different reasons.


Robin Gomila, Rebecca Littman, Graeme Blair (University of California, Los Angeles) and Elizabeth Paluck use audio recording interviews to eliminate ongoing data-fabrication to write “The Audio Check: A Method for Improving Data Quality and Detecting Data Fabrication” in Social Psychological and Personality Science. Data quality and trust in the data collection process are critical concerns in survey research, particularly when surveyors are needed for reaching “diverse and inconvenient subject pools.” In response to irregularities in a smartphone-based pilot survey data collection in Nigeria, an audio check method that unobtrusively recorded surveyors reading aloud questions to participants was developed. Evidence was presented that this method detected wholesale data fabrication in 14% of these surveys, prevented further fabrication, and improved data quality through provision of regular feedback to surveyors. Using simulation demonstrated that undetected fabrication would have introduced significant bias in the analyses. The audio check performs well compared to more traditional methods of detecting fabrication, and a comparative cost–benefit analysis reveals a savings of more than U.S. $1,500 per surveyor by relying on the audio check. The audio check is a viable tool for psychologists who work with survey teams.


In The Stata Journal, Germán Rodríguez published an article entitled, “Literate Data Analysis with Stata and Markdown”. In this article, he introduce markstat, a command for combining Stata code and output with comments and annotations written in Markdown into a beautiful webpage or PDF file, thus encouraging literate programming and reproducible research. The command tangles the input separating Stata and Markdown code, runs the Stata code, relies on Pandoc to process the Markdown code, and then weaves the outputs into a single file. HTML documents may include inline and display math using MathJax. Generating PDF output requires access to LaTeX and a style file from Stata but works with the same input file.


“The Network Survival Method for Estimating Adult Mortality: Evidence from a Survey Experiment in Rwanda,” published in Demography and written by Dennis Feehan, Mary Mahy (Joint United Nations Programme on HIV/AIDS (UNAIDS), Geneva, Switzerland), and Matthew Salganik states that adult death rates are a critical indicator of population health and wellbeing. Wealthy countries have high-quality vital registration systems, but poor countries lack this infrastructure and must rely on estimates that are often problematic. In this paper, they introduce the network survival method, a new approach for estimating adult death rates. They derive the precise conditions under which it produces estimates that are consistent and unbiased. Further, they develop an analytical framework for sensitivity analysis. To assess the performance of the network survival method in a realistic setting, they conducted a nationally-representative survey experiment in Rwanda (n=4,669). Network survival estimates were similar to estimates from other methods, even though the network survival estimates were made with substantially smaller samples and are based entirely on data from Rwanda, with no need for model life tables or pooling of data from other countries. Their analytic results demonstrate that the network survival method has attractive properties, and their empirical results show that it can be used in countries where reliable estimates of adult death rates are sorely needed.


Alexander Kindel, Michael Yeomans (Harvard University), Justin Reich (Massachusetts Institute of Technology), Brandon Stewart and Dustin Tingley (Harvard University) wrote, “Discourse: MOOC Discussion Forum Analysis at Scale” published in Proceedings of the Fourth. The authors present Discourse, a tool for coding and annotating MOOC discussion forum data. Despite the centrality of discussion forums to learning in online courses, few tools are available for analyzing these discussions in a context-aware way. The app Discourse scaffolds the process of coding forum data by enabling multiple coders to work with large amounts of forum data.


Melissa Martinson (Columbia University), Marta Tienda, and Julian Teitler (Social Indicators Survey Center, Columbia University) published, “Low Birthweight among Immigrants in Australia, the United Kingdom, and the United States,” in Social Science & Medicine. Immigrant women are less likely than their native-born counterparts to give birth to a low birthweight infant in the United States, and length of U.S. residence shrinks nativity differences in rates of low birthweight. Yet, little is known about how the U.S. context compares to immigrant low birthweight patterns in other countries. Using nationally representative data, the authors examine variations in the association between nativity and low birthweight in Australia, the United Kingdom, and the United States—three economically developed countries with long immigrant traditions, but different admission regimes. This study uses birth cohort data from these three destination countries to compare low birthweight between immigrant and native-born residents and then investigates how immigrant low birthweight varies by country of origin and duration in the host country. They find no significant difference in low birthweight between immigrants and native Australians, but for the United Kingdom, they find patterns of low birthweight by duration consistent with those found in the United States. Specifically, foreign-born status protects against low birthweight, though not uniformly across racial groups, except for new arrivals. The results suggest that low birthweight among immigrants is a product of several country-specific factors, including rates of low birthweight in sending countries, access to health services in host countries, and immigrant admission policies that advantage skilled migrants.


Marta Tienda published, Multiplying Diversity: Family Unification and the Regional Origins of Late-Age U.S. Immigrants,” in International Migration Review. The author uses administrative data about new legal permanent residents to show how family unification chain migration changed both the age and regional origin of U.S. immigrants. Between 1981 and 1995, every 100 initiating immigrants from Asia sponsored between 220 and 255 relatives, but from 1996 through 2000, each 100 initiating immigrants from Asia sponsored nearly 400 relatives, with one-in-four ages 50 and above. The family migration multiplier for Latin Americans was boosted by the legalization program: from 1996 to 2000, each of the 100 initiating migrants from Latin America sponsored between 420 and 531 family members, of which 18–21% were ages 50 and over.


In the Journal of Marriage and Family article, “Anticipated Emotions about Unintended Pregnancy in Relationship Context: Are Latinas Really Happier?,” Abigail Aiken (University of Texas, Austin) and James Trussell examine differences in women's anticipated emotional orientations towards unintended pregnancy by relationship status and race/ethnicity. Data from a prospective survey of 437 women aged 18-44 who intended no more children for at least two years were analyzed along with 27 in-depth interviews among a diverse sub-sample. Cohabiting women and women in a romantic relationship not living together were less likely to profess happiness even when partners’ intentions/feelings were controlled. The most prominent factor underlying negative feelings was partners’ anticipated lack of engagement with the emotional, physical, and financial toll of unintended childbearing. Contrary to conventional wisdom regarding the “Hispanic paradox”, foreign-born and US-born Latinas were no more likely to profess happiness than non-Hispanic whites or blacks. Moreover, foreign-born Latinas whose survey responses indicated happiness often revealed highly negative feelings at in-depth interview, citing pressure to conform to sociocultural norms surrounding motherhood and abortion.


Yu Xie and Hongwei Xu (University of Michigan) published, “Socioeconomic Inequalities in Health in China: A Reassessment with Data from the 2010–2012 China Family Panel Studies,” in Social Indicators Research. This paper explains that well-documented high levels of socioeconomic inequalities, health gradients by socioeconomic status (SES) in contemporary China have been reported to be limited. Using data from the 2010-2012 China Family Panel Studies, they reexamine associations between three sets of SES-human capital, material conditions, and political capital-and self-rated health among Chinese adults 18-70 years old, capitalizing on anchoring vignette data to adjust for reporting heterogeneity. They find strong evidence of substantial variations in reporting behaviors by education, cognition, and family wealth but not by family income or political capital. Failing to correct for reporting heterogeneity can bias the estimates of SES gradients in self-rated health as much as nearly 40 %. After vignette adjustment, they find significantly positive associations of education, family income, wealth, and political capital with self-rated health. Individuals' cognitive capacity, however, does not predict self-rated health.