# Transfer Learning for Digital Curation of the EMR Clinical Narrative

> **NIH NIH R01** · BOSTON CHILDREN'S HOSPITAL · 2022 · $376,125

## Abstract

Project Summary
This proposal is in response to PAR 18-796 to seek support for advancing methodologies for a transfer
learning framework for the digital curation of the Electronic Medical Records (EMR) clinical narrative. In the
current era of increasing importance of Artificial Intelligence (AI) in biomedicine, our proposal tackles a critical
AI component – automated annotation of health-related text. Since 2015 the development and application of
machine learning (ML) methods has exploded propelled by the convergence of plentiful digitized unstructured
data (text, speech, images), hardware and the refinement of neural networks or deep learning. 2018 marked a
turning point in Natural Language Processing (NLP), particularly transfer learning through pre-trained models
like Universal Language Model Fine-tuning for Text Classification, Allen AI's ELMO, OpenAI's Open-GPT. In
November 2018, Google published the Bidirectional Encodings Representations from Transformers (BERT), a
transformer-based model pre-trained on massive general text databases (3.3B words total). The publication
reported using BERT representations to build classifiers for 11 NLP tasks which outperformed the state-of-the-
art (SOTA) with large margins. The NLP research community jumped to the idea of exploring this new
framework but quickly came to the realization that building BERT-style models from scratch is affordable and
feasible to only a few. Thus, research investigation proceeded in the direction of using these gigantic models
as resources for language representations. Scientific efforts focused on pre-trained models (e.g. BERT) as a
source of extracting high quality language features or fine-tuning on a specific task, i.e. using a model as a
checkpoint and re-training with much smaller amounts of task-specific data to produce predictions by typically
adding one fully-connected layer on top of the representations and training for a few epochs. This general
watershed shift in NLP to transfer learning which parallels the developments in computer vision a few years
ago coupled with our latest work brings to the forefront a critical NLP research topic ripe for exploration – a
transfer learning framework for the digital curation of the EMR clinical narrative. The proposed work is research
of novel scientific methods for extracting detailed information from health-related text especially the EMR, the
major source of phenotype data for patients. Precise phenotype information is needed to advance translational
research, particularly to unravel the effects of genetic, epigenetic, and systems changes on responsiveness.
This research is in line with the latest developments in neural deep learning approaches and AI in general and
is expected to enhance biomedical research and through that the health of the public.

## Key facts

- **NIH application ID:** 10468604
- **Project number:** 5R01LM013486-02
- **Recipient organization:** BOSTON CHILDREN'S HOSPITAL
- **Principal Investigator:** GUERGANA K. SAVOVA
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $376,125
- **Award type:** 5
- **Project period:** 2021-08-12 → 2025-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10468604

## Citation

> US National Institutes of Health, RePORTER application 10468604, Transfer Learning for Digital Curation of the EMR Clinical Narrative (5R01LM013486-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10468604. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*