Use of NLP to Extract Risk Indicators for Immunologic Disease from the Text of EHRs (UNIITE)

NIH RePORTER · NIH · R21 · $209,417 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY The Health Information Technology for Economic and Clinical Health (HITECH) Act, enabled widespread adoption of electronic health records (EHRs). In lockstep, healthcare systems have seen the impact of biomedical informatics techniques such as natural language processing (NLP) for mining inference and classifying EHR data. Combining digital health records and available analytical tools represents an opportunity to improve care for patients who suffer from rare disease such as primary immune deficiency (PID) where optimal outcomes are predicated upon early detection. However, only a fraction of PID patients receive a diagnosis before sustaining serious infections. This underscores a need for novel methods to improve diagnostic rates and for driving understanding about PID. At present, barriers to detecting PIDs include recognizing heterogeneous clinical features, distinguishing infections in PID from that of the normal host, and general lack of awareness about the diseases. Creating precise analytical methods to mine and make predictions from EHR data represents a potential solution to these challenges. As such our goals are to develop a system for automatic extraction of PID risk indicators from EHR notes for the purpose of improving widespread diagnosis and advancing knowledge about human immunologic disease. Our preliminary work suggests that structured EHR data such as problem list elements and diagnostic codes can be used to develop a probabilistic framework for assessing risk of PID but they are limited and do not exemplify the full range of concepts needed to optimally characterize PID. Data elements mined from text can couple to presently available EHR structured data and ontologies for improved annotation of PID. Building a framework of PID-specific risk indicators will enable NLP approaches for PID detection and improved understanding about human immune dysfunction. The Specific Aims for our proposal are as follows: 1.) To use a data-driven approach for identifying and enumerating PID risk indicators from EHR text. 2.) To develop NLP methods for automatically extracting key PID risk indicators from EHR text. Our proposal leverages a very large corpus of EHR note text captured from over 2000 PID patients prior to their ultimate diagnosis, as well as almost 5000 control patients. Mining this dataset and synergistically using state-of-the-art NLP methodologies will build the foundation of a potent and interoperable text mining system for PID risk detection. We expect this work to advance disease detection, characterization of specific human immune defects and allow for additional inference which combines clinical, laboratory, and molecular information about PID.

Key facts

NIH application ID
10615338
Project number
7R21AI164100-02
Recipient
LIBERTY UNIVERSITY, INC.
Principal Investigator
Nicholas L Rider
Activity code
R21
Funding institute
NIH
Fiscal year
2022
Award amount
$209,417
Award type
7
Project period
2022-02-18 → 2023-12-31