# Large-Scale Nationally Representative Person-Generated Health Data for Development of Generalizable Data Science Methodologies for Precision Public Health

> **NIH NIH R01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2023 · $254,189

## Abstract

Large-Scale Nationally Representative Patient-generated Health Data for Development of Generalizable
Data Science Methodologies for Precision Public Health. Racial-ethnic minorities, socioeconomically
disadvantaged, and other underserved populations experience disproportionate adverse health outcomes
despite decades of research correlating social determinants (SDs) to variations in health outcomes. Many public
health approaches use population averages to create “one-size-fits-all” interventions to increase the probability
of achieving the best outcomes for the average person, but are limited by population heterogeneity in number,
magnitude, interplay, and amplification of SDs. Precision public health (PPH) emerged to use digital technologies
(DTs) to develop interventions targeting unique needs of specific populations to improve the health and reduce
disparities. Analysis of voluminous, precise, continuous, and longitudinal data generated by DTs holds great
promise for PPH as smartphones, Internet of Things, and wearable sensors are becoming ubiquitous, generating
data on environment, transportation, geolocation, diet, exercise, social interactions, and daily activities. These
person-generated health data (PGHD) have unprecedented potential to add rich insight on everyday human
behaviors to traditional health research. Though clinical PGHD applications are in early stages, there is rapid
progress in development of digital indicators of health, offering virtually limitless potential. Because PGHD are
typically captured outside of controlled research settings, they suffer from challenges of non-traditional data that
impede their acceptance and use across the healthcare ecosystem. First, PGHD are vulnerable to input biases
as users of consumer DTs are a self-selected group. Second, PGHD suffer from poor internal data quality due
to high variability in completeness for reasons that are not always equally distributed across individuals (e.g.,
connectivity issues, battery, user forgetfulness, user error). Together, input bias and poor data quality lead to
poor external validity, where analytics derived from PGHD are not generalizable to the broader population. The
objective of this partnership between the RAND Corporation and Evidation Health is to improve generalizability
of data science methods for PGHD, allowing for representation of all population groups, including the historically
underserved. We will accomplish this goal via three aims: (i) generate PGHD from a nationally representative
probability sample of Americans to understand the social distribution of user engagement with health DTs and
poor sleep health; (ii) develop a methodology that characterizes missing data within PGHD and selects
appropriate imputation strategies (existing and novel) optimized for reduction in model bias and socio-
demographic input disparities; and, (iii) create a propensity-score based statistical weighting methodology to
improve the effectiveness and applicabilit...

## Key facts

- **NIH application ID:** 10591527
- **Project number:** 5R01LM013237-04
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Ritika Ratnam Chaturvedi
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $254,189
- **Award type:** 5
- **Project period:** 2020-07-01 → 2025-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10591527

## Citation

> US National Institutes of Health, RePORTER application 10591527, Large-Scale Nationally Representative Person-Generated Health Data for Development of Generalizable Data Science Methodologies for Precision Public Health (5R01LM013237-04). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10591527. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
