# CenSoc: A New, Public, Individual-level Dataset for Studying Mortality Inequality

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA BERKELEY · 2022 · $558,114

## Abstract

Summary/Abstract
The CENSOC project – so named because it links 1940 Census data with Social Security Administration death
records – will construct and share a new, large-scale, public microdata data set to be used for advancing
understanding of mortality disparities in the United States. The project uses record linkage techniques to match
deaths aged 65-and-over observed from 1975 to 2009 back to individual, family, and neighborhood
characteristics in the census. Building on preliminary studies, we estimate that the use of modern data-linkage
techniques will allow us to construct a data set of about 15 million deaths, more than 30 times the size of the
largest existing sample surveys. The unprecedented scale and detail of CENSOC data will allow researchers to
make new discoveries in areas such as (a) mortality disparities by education, national origin, and race; (b)
early life conditions and later-life mortality; and (c) geographic variation and the neighborhood determinants of
mortality. These topics are of increasing importance in understanding increases in disparities in life expectancy
in the United States.
The creation and distribution of population-level administrative mortality data with individual characteristics is
central to the goal of promoting rigorous and replicable scientific research in mortality in the United States.
Following the model of the Human Mortality Database (HMD) and the Integrated Public Use Microdata Series
(IPUMS), the CENSOC data will be available for analysis on a distribution website. For those wishing to work
with identifiable records and the complete count census, access will be possible using the existing network of
more than 50 Complete Count Census repositories that already exist as part of licensing agreement between
U.S. academic institutions and the University of Minnesota. This secure access will allow additional data sets
with individual identifiers to be linked to the CENSOC data.
To facilitate usage of this rich dataset, the project will include development of new methods for estimating
mortality rates especially appropriate for linked data. We will also carry out a set of `high resolution' studies on
mortality disparities and longevity determinants that will serve to advance knowledge as well as demonstrate
the potential uses of the CENSOC data set.
By taking advantage of existing administrative records, the CENSOC project has the potential to provide a vast,
richly detailed, public “big data” resource for researchers studying old-age mortality disparities and the
determinants of longevity.

## Key facts

- **NIH application ID:** 10371013
- **Project number:** 5R01AG058940-04
- **Recipient organization:** UNIVERSITY OF CALIFORNIA BERKELEY
- **Principal Investigator:** JOSHUA R. GOLDSTEIN
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $558,114
- **Award type:** 5
- **Project period:** 2019-08-01 → 2024-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10371013

## Citation

> US National Institutes of Health, RePORTER application 10371013, CenSoc: A New, Public, Individual-level Dataset for Studying Mortality Inequality (5R01AG058940-04). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10371013. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
