# Using natural language processing and machine learning to identify potentially preventable hospital admissions among outpatients with chronic lung diseases

> **NIH NIH K23** · UNIVERSITY OF PENNSYLVANIA · 2021 · $173,801

## Abstract

Project Summary
Patients living with chronic lung diseases (CLDs) are frequently admitted to the hospital for potentially preventable
causes. Such admissions may be discordant with patient preferences and/or represent a low-value allocation of
health system resources. To anticipate such admissions, existing clinical prediction models in this field typically
produce an “all-cause” risk estimate which, even if accurate, overlooks the actionable mechanisms behind 
admission risk and therefore fails to identify a prescribed response. This limitation may explain the only modest – at best
– reductions in hospital admissions and readmissions seen in most intervention bundles that have been tested
in this population. An opportunity exists, therefore, to predict hospitalization risk while simultaneously identifying
patient phenotypes (i.e. some constellation of social, demographic, clinical, and other characteristics) for which
known preventive interventions exist. The proposed study seeks to overcome these limitations and capitalize
on this opportunity by (1) conducting semi-structured interviews with hospitalized patients with CLDs, and their
caregivers and clinicians, to directly identify modifiable risks and their associated phenotypes driving hospital 
admissions; (2) using natural language processing techniques (NLP) to build classification models that will leverage
nuanced narrative, social, and clinical information in the unstructured text of clinical encounter notes to identify
patients with these phenotypes; and (3) building risk prediction model focused on actionable phenotypes with
a wide-array of traditional regression and machine learning approaches while also incorporating large numbers
of predictor variables from text data and accounting for time-varying trends. The candidate's preliminary work
using basic NLP techniques to significantly improve the discrimination of clinical prediction models in an inpatient
population has motivated this methodologic approach. The rising burden and costs of hospitalizations associated
with CLDs, and the increasing attention from federal payers, highlights the critical nature of this work. Completion
of this research will build upon the candidate's past training, which includes a Masters of Science in Health Policy
Research obtained with NHLBI T32 support, and will provide the experience, education, and mentorship to allow
the candidate to become a fully independent investigator. Based on the candidate's tailored training plan, he will
acquire advanced skills in mixed-methods research, NLP, and trial design all through coursework, close 
mentoring and supervision, and direct practice. The skills will position him ideally to submit successful R01s testing
the deployment of the proposed clinical prediction models in real-world settings. The candidate's primary mentor,
collaborators, and advisors will ensure adherence to the proposed timeline and goals and provide a 
supportive environment for him to develop ...

## Key facts

- **NIH application ID:** 10134412
- **Project number:** 5K23HL141639-04
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Gary Weissman
- **Activity code:** K23 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $173,801
- **Award type:** 5
- **Project period:** 2018-04-09 → 2023-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10134412

## Citation

> US National Institutes of Health, RePORTER application 10134412, Using natural language processing and machine learning to identify potentially preventable hospital admissions among outpatients with chronic lung diseases (5K23HL141639-04). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10134412. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
