# Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research

> **NIH NIH R15** · TENNESSEE STATE UNIVERSITY · 2020 · $421,700

## Abstract

Optimizing the Utility of Electronic Medical Records Data in
 Data-driven Health Research
ABSTRACT
Medical centers continue to archive patient follow-up data in Electronic Medical Records (EMR), which have
tremendous value in discovering new knowledge and insights. The large volume of EMR data can play an
important role in improving the accuracy and generalizability of predictive models in healthcare, especially
when misdiagnosis is known to be the third leading cause of death in the United States. Despite these merits,
EMR data are invariably corrupted by factors like missing values, outliers, and unrealistic measurements, which
prevent researchers from fully utilizing such abundant data in many important studies. Many studies simply
discard a large number of samples to get rid of missingness and eventually bias their data-driven analytical
models. Existing techniques for missing data imputation use simplified linear models and are mostly suitable
for imputing cross-sectional data missingness that ignore longitudinal missingness in patient follow-up data.
This proposal aims to investigate novel artificial intelligence (AI) based models to improve the quality and utility
of EMR data in preparation for data-driven retrospective studies. Toward this preparation, the goal of the project
is 1) to investigate more accurate and robust data imputation models compared to existing ones and 2) adapt
state-of-the-art deep learning techniques in preparing optimal representation of large EMR data. The proposed
research will 1) maximize the quality and utility of EMR data to support a multitude of retrospective studies, 2)
enable visualization of complex patient data, 3) identify more important and predictive clinical parameters, 4)
yield a compact and optimal representation of large EMR datasets. We hypothesize that optimally processed
EMR data with state-of-the-art AI models can most accurately model patient risk when compared to existing
statistical and clinical risk models.
This project will combine the complementary expertise of the collaborators, Dr. Manar Samad, PhD (Computer
Science), Dr. Owen Johnson, DPH (Biostatistics and Public Health), and Dr. Edilberto Raynes, MD, PhD
(Medicine) along with the participating undergraduate students at Tennessee State University (TSU). The
proposal entails several research and development components that will allow undergraduate students to gain
valuable research and analytical skills in data science, programming, and health informatics. The project
activities will expose health science students to AI-based computing solutions to broaden their scope of future
health research and career. This project will help TSU prepare a strong workforce of minority students who will
gain competitive skill sets in data science and health informatics that are currently high in demand almost
everywhere. Overall, the project will develop a data-capable workforce to strengthen an interdisciplinary
research capacity and collaboration b...

## Key facts

- **NIH application ID:** 10111205
- **Project number:** 1R15LM013569-01
- **Recipient organization:** TENNESSEE STATE UNIVERSITY
- **Principal Investigator:** Manar Samad
- **Activity code:** R15 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $421,700
- **Award type:** 1
- **Project period:** 2020-09-18 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10111205

## Citation

> US National Institutes of Health, RePORTER application 10111205, Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research (1R15LM013569-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10111205. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*