# Use of NLP to Extract Risk Indicators for Immunologic Disease from the Text of EHRs (UNIITE)

> **NIH NIH R21** · VIRGINIA POLYTECHNIC INST AND ST UNIV · 2024 · $169,779

## Abstract

Title: Use of NLP to extract risk Indicators for Immunologic disease from the Text of EHRs.
(UNIITE)
Summary:
The Health Information Technology for Economic and Clinical Health (HITECH) Act, enabled widespread
adoption of electronic health records (EHRs). In lockstep, healthcare systems have seen the impact
of biomedical informatics techniques such as natural language processing (NLP) for mining inference and
classifying EHR data. Combining digital health records and available analytical tools represents an
opportunity to improve care for patients who suffer from rare disease such as primary immune deficiency
(PID) where optimal outcomes are predicated upon early detection. However, only a fraction of PID
patients receive a diagnosis before sustaining serious infections. This underscores a need for novel methods
to improve diagnostic rates and for driving understanding about PID. At present, barriers to detecting
PIDs include recognizing heterogeneous clinical features, distinguishing infections in PID from that of the
normal host, and general lack of awareness about the diseases. Creating precise analytical methods to
mine and make predictions from EHR data represents a potential solution to these challenges. As such
our goals are to develop a system for automatic extraction of PID risk indicators from EHR notes for the
purpose of improving widespread diagnosis and advancing knowledge about human immunologic disease.
Our preliminary work suggests that structured EHR data such as problem list elements and diagnostic
codes can be used to develop a probabilistic framework for assessing risk of PID but they are limited and
do not exemplify the full range of concepts needed to optimally characterize PID. Data elements mined
from text can couple to presently available EHR structured data and ontologies for improved annotation
of PID. Building a framework of PID-specific risk indicators will enable NLP approaches for PID detection
and improved understanding about human immune dysfunction. The Specific Aims for our proposal are
as follows: 1.) To use a data-driven approach for identifying and enumerating PID risk indicators from
EHR text. 2.) To develop NLP methods for automatically extracting key PID risk indicators from EHR text.
Our proposal leverages a very large corpus of EHR note text captured from over 3000 PID patients prior
to their ultimate diagnosis, as well as over 285,000 control patients. Mining this dataset and synergistically
using state-of-the-art NLP methodologies will build the foundation of a potent and interoperable text
mining system for PID risk detection. We expect this work to advance disease detection, characterization
of specific human immune defects and allow for additional inference which combines clinical, laboratory,
and molecular information about PID.

## Key facts

- **NIH application ID:** 10982603
- **Project number:** 7R21AI164100-03
- **Recipient organization:** VIRGINIA POLYTECHNIC INST AND ST UNIV
- **Principal Investigator:** Nicholas L Rider
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $169,779
- **Award type:** 7
- **Project period:** 2022-02-18 → 2026-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10982603

## Citation

> US National Institutes of Health, RePORTER application 10982603, Use of NLP to Extract Risk Indicators for Immunologic Disease from the Text of EHRs (UNIITE) (7R21AI164100-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10982603. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*