# Computable social factor phenotyping using EHR and HIE data

> **NIH AHRQ R01** · INDIANA UNIVERSITY INDIANAPOLIS · 2022 · $397,558

## Abstract

Most health systems attempt to measure patients' social risk factors, but such data collection is typically fraught
with operational and conceptual difficulties. Multi-domain screening questionnaires face reliability, validity, and
workflow challenges. Area-level data are not valid proxies for individual characteristics. Diagnosis codes are
underutilized. The day-to-day use of natural language processing (NLP) to extract social factors from text is
beyond the capacity of most organizations. Thus, health care organizations need more implementable and valid
approaches to measuring social factors. With implementable and valid approaches, health systems will more
effectively address the negative cost, quality and health outcomes associated with patients' social risk factors.
The objective of this proposal is to assess the validity of patient-level computable social factor phenotypes for
use in predicting patients' risk of increased healthcare costs and utilization. Computable phenotypes are com-
posites of characteristics defined through single data elements or a collection of data elements, observations or
events. Because these phenotypes derive from existing healthcare operations and electronic data systems, they
are well-positioned for widespread implementation. Our central hypothesis is that phenotypes computed from
existing structured demographic, clinical, and business operations data will support equally or more valid infer-
ences about patient social risks than other measurement approaches. Building upon strong preliminary data and
direction from experts in the field, we will determine the validity and usefulness of six novel social factor pheno-
types computed from already collected information within EHRs and health information exchanges (HIE) through
the following aims: Aim 1, Assess the concurrent validity of patient-level computable social factor phenotypes,
compares the concurrent validity of computed phenotypes, multi-domain questionnaires, and NLP against gold
standard measures of social factors in two health systems. Aim 2, Assess the predictive validity of patient-level
computable social factor phenotypes, will assess the validity of computable phenotypes, multi-domain question-
naires, NLP, and combined approaches in predicting costs and utilization. Aim 3, Assess the reliability (bias) of
patient-level computable social factor phenotypes across patient gender, race, ethnicity, and age, assesses the
reproducibility of measurement approaches across underserved populations. We will employ a multi-method
research approach to identify and mitigate potential bias. This project will lead to more valid and implementable
approaches to patient social factor measurement. The proposed research is significant because it directly ad-
dresses the challenges organizations face in addressing patients' social risks and will provide key inputs to
support organizations efforts at achieving a learning health system. This proposal is innovative by adva...

## Key facts

- **NIH application ID:** 10488222
- **Project number:** 5R01HS028636-02
- **Recipient organization:** INDIANA UNIVERSITY INDIANAPOLIS
- **Principal Investigator:** Joshua Ryan Vest
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** AHRQ
- **Fiscal year:** 2022
- **Award amount:** $397,558
- **Award type:** 5
- **Project period:** 2021-09-30 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10488222

## Citation

> US National Institutes of Health, RePORTER application 10488222, Computable social factor phenotyping using EHR and HIE data (5R01HS028636-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10488222. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
