# Learning Universal Patient Representations with Hierarchical Transformers

> **NIH NIH R01** · BOSTON CHILDREN'S HOSPITAL · 2024 · $571,552

## Abstract

Project Summary
Electronic health records contain a wealth of information about patient health status that can be mined for
multiple purposes, including clinical research and improved decision-making at the point of care. This
information can be represented as structured variables, unstructured text, and images, among other data
types. In this work, we develop new models for representing the unstructured text that take advantage of
powerful neural models called pre-trained transformers. We propose to make these models usable for much
longer texts by adding hierarchical layers to operate over summaries of smaller chunks of text, and shrinking
the size of the encoder that operates on smaller chunks. First, we develop a smaller encoder for sentence and
paragraph-sized texts, by using a technique called extreme distillation that trains smaller models from the
output of larger models. We also propose to pre-train hierarchical models for text, by taking advantage of
smaller encoders like that from the first aim. We take advantage of both public and private datasets and
experiment with different pre-training tasks and architectures. Our final aim proposes to combine
representations learned from text with those from the more mature areas of structured data and images. We
design experiments that answer the question of how best to merge these different information sources, and
apply them to important clinical classification use cases that are likely to require multiple information sources
for accurate performance. Specifically, we address the clinical tasks of predicting injury severity in emergency
departments, and predicting diagnosis and prognosis of patients in intensive care units.

## Key facts

- **NIH application ID:** 10929940
- **Project number:** 5R01LM012973-05
- **Recipient organization:** BOSTON CHILDREN'S HOSPITAL
- **Principal Investigator:** Timothy A Miller
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $571,552
- **Award type:** 5
- **Project period:** 2019-02-07 → 2027-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10929940

## Citation

> US National Institutes of Health, RePORTER application 10929940, Learning Universal Patient Representations with Hierarchical Transformers (5R01LM012973-05). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10929940. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
