# Natural Language Processing Platform for Cancer Surveillance

> **NIH NIH UH3** · BOSTON CHILDREN'S HOSPITAL · 2021 · $660,155

## Abstract

Modified Project Summary/Abstract Section
This UG3/UH3 proposal titled “Natural Language Processing Platform for Cancer Surveillance” is in response to Research Area 1 of PAR 16-349 (https://grants.nih.gov/grants/guide/pa-files/par-16-349.html) specifically addressing the development of natural language processing (NLP) tools to facilitate automatic/unsupervised/minimally supervised extraction of specific discrete cancer-related data from various types of unstructured electronic medical records (EMRs) related to the activities of cancer registries. It is submitted through a multi-PI mechanism – Prof. Guergana Savova from Boston Children’s Hospital/Harvard Medical School, Dr. Jeremy Warner from Vanderbilt University Medical Center, Prof. Harry Hochheiser from the University of Pittsburgh, and Prof. Eric Durbin from the Kentucky Cancer Registry/University of Kentucky. The current proposal builds on prior work funded by the NCI Informatics Tools for Cancer Research (ITCR) program (https://itcr.cancer.gov/ ). We envision building on our work to date to advance methods for information extraction of clinical phenotyping data needed to fuel a new cancer surveillance paradigm that would benefit hospital-based, state-based, and national cancer registries. In this new paradigm, surveillance programs would use the methods to enhance the speed, accuracy, and ease of cancer reporting. The proposed DeepPhe*CR platform could be deployed at local sites or centrally, and could eventually be integrated into existing or new visualization and abstraction tools as needed by the cancer surveillance community. Although there has been some previous work on automatic phenotype extraction from the various streams of data including the clinical narrative for specific types of cancer or individual variables for cancer surveillance, the proposed work will be a step towards a generalizable information extraction. This generalizability enables extensibility and scalability. Interoperability is reinforced through the modeling part of the proposed project which is grounded in most recent advances in biomedical ontologies, terminologies, community-adopted conventions and standards. Our planned partnership with three SEER cancer registries provides our decision-making processes with a solid foundation in large-scale cancer surveillance.

## Key facts

- **NIH application ID:** 10441803
- **Project number:** 4UH3CA243120-03
- **Recipient organization:** BOSTON CHILDREN'S HOSPITAL
- **Principal Investigator:** Eric B. Durbin
- **Activity code:** UH3 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $660,155
- **Award type:** 4N
- **Project period:** 2019-07-19 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10441803

## Citation

> US National Institutes of Health, RePORTER application 10441803, Natural Language Processing Platform for Cancer Surveillance (4UH3CA243120-03). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/10441803. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
