# Multi-objective representation learning methods for interpetable predictions of patient outcomesusing electronic health records

> **NIH NIH K99** · UNIVERSITY OF PENNSYLVANIA · 2020 · $89,401

## Abstract

Project Summary/Abstract
This project proposes new methods for representing data in electronic health records (EHR) to improve pre-
dictive modeling and interpretation of patient outcomes. EHR data offer a promising opportunity for advancing
the understanding of how clinical decisions and patient conditions interact over time to inﬂuence patient health.
However, EHR data are difﬁcult to use for predictive modeling due to the various data types they contain (con-
tinuous, categorical, text, etc.), their longitudinal nature, the high amount of non-random missingness for certain
measurements, and other concerns. Furthermore, patient outcomes often have heterogenous causes and re-
quire information to be synthesized from several clinical lab measures and patient visits. The core challenge
at hand is overcoming the mismatch between data representations in the EHR and the assumptions underly-
ing commonly used statistical and machine learning (ML) methods. To this end, this project proposes novel
wrapper-based methods for learning informative features from EHR data. Both methods propose specialized
operators to handle sequential data, time delays, and variable interactions, and have the capacity to discover
underlying clinical rules/decisions that affect patient outcomes. Importantly, both methods also produce archives
of possible models that represent the best trade-offs between complexity and accuracy, which assists in model
interpretation. These method advances are made possible by encoding a rich set of data operations as nodes
in a directed acyclic graph, and optimizing the graph structures using multi-objective optimization. The central
hypothesis of this research is that multi-objective optimization can learn effective data representations from
the EHR to produce accurate, explanatory models of patient outcomes. Preliminary work has shown that these
methods can effectively learn low-order data representations that improve the predictive ability of several state-
of-the-art ML methods. This technique demonstrates good scaling properties with high-dimensional biomedical
data. Aim 1 (K99) is to develop a multi-objective feature engineering method that pairs with existing ML methods
to iteratively improve their performance by constructing new features from the raw data and using feedback from
the trained model to guide feature construction. In Aim 2 (K99), this method is applied to form predictive models
of the risk of heart disease and heart failure using longitudinal EHR data. The resultant models will be inter-
preted with the help of mentors in order to translate predictions into clinical recommendations. For Aim 3 (R00),
a second method is proposed that uses a similar framework to optimize existing neural network approaches in
order to simplify their structure as much as possible while maintaining accuracy. The goal of Aim 4 (R00) is
to identify hospital patients who are at risk of readmission and propose point-of-care strategies to mitigate th...

## Key facts

- **NIH application ID:** 9936444
- **Project number:** 5K99LM012926-02
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** William La Cava
- **Activity code:** K99 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $89,401
- **Award type:** 5
- **Project period:** 2019-06-01 → 2021-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9936444

## Citation

> US National Institutes of Health, RePORTER application 9936444, Multi-objective representation learning methods for interpetable predictions of patient outcomesusing electronic health records (5K99LM012926-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/9936444. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*