# Use of NLP to Extract Risk Indicators for Immunologic Disease from the Text of EHRs (UNIITE)

> **NIH NIH R21** · LIBERTY UNIVERSITY, INC. · 2022 · $209,417

## Abstract

PROJECT SUMMARY
The Health Information Technology for Economic and Clinical Health (HITECH) Act, enabled widespread
adoption of electronic health records (EHRs). In lockstep, healthcare systems have seen the impact
of biomedical informatics techniques such as natural language processing (NLP) for mining inference and
classifying EHR data. Combining digital health records and available analytical tools represents an opportunity
to improve care for patients who suffer from rare disease such as primary immune deficiency (PID) where
optimal outcomes are predicated upon early detection. However, only a fraction of PID patients receive a
diagnosis before sustaining serious infections. This underscores a need for novel methods to improve diagnostic
rates and for driving understanding about PID. At present, barriers to detecting PIDs include recognizing
heterogeneous clinical features, distinguishing infections in PID from that of the normal host, and general lack
of awareness about the diseases. Creating precise analytical methods to mine and make predictions from EHR
data represents a potential solution to these challenges. As such our goals are to develop a system for automatic
extraction of PID risk indicators from EHR notes for the purpose of improving widespread diagnosis and
advancing knowledge about human immunologic disease. Our preliminary work suggests that structured EHR
data such as problem list elements and diagnostic codes can be used to develop a probabilistic framework for
assessing risk of PID but they are limited and do not exemplify the full range of concepts needed to optimally
characterize PID. Data elements mined from text can couple to presently available EHR structured data and
ontologies for improved annotation of PID. Building a framework of PID-specific risk indicators will enable NLP
approaches for PID detection and improved understanding about human immune dysfunction. The Specific
Aims for our proposal are as follows: 1.) To use a data-driven approach for identifying and enumerating PID risk
indicators from EHR text. 2.) To develop NLP methods for automatically extracting key PID risk indicators from
EHR text. Our proposal leverages a very large corpus of EHR note text captured from over 2000 PID patients
prior to their ultimate diagnosis, as well as almost 5000 control patients. Mining this dataset and synergistically
using state-of-the-art NLP methodologies will build the foundation of a potent and interoperable text mining
system for PID risk detection. We expect this work to advance disease detection, characterization of specific
human immune defects and allow for additional inference which combines clinical, laboratory, and molecular
information about PID.

## Key facts

- **NIH application ID:** 10615338
- **Project number:** 7R21AI164100-02
- **Recipient organization:** LIBERTY UNIVERSITY, INC.
- **Principal Investigator:** Nicholas L Rider
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $209,417
- **Award type:** 7
- **Project period:** 2022-02-18 → 2023-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10615338

## Citation

> US National Institutes of Health, RePORTER application 10615338, Use of NLP to Extract Risk Indicators for Immunologic Disease from the Text of EHRs (UNIITE) (7R21AI164100-02). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10615338. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
