# THE CANCER EPITOPE DATABASE AND ANALYSIS RESOURCE

> **NIH NIH U24** · LA JOLLA INSTITUTE FOR IMMUNOLOGY · 2023 · $366,000

## Abstract

Project Summary
The primary goal of our proposal is to improve the AI/ML-readiness of data in the Cancer Epitope Database and
Analysis Resource (CEDAR). CEDAR provides a catalog of manually curated information from journal articles
that describe the specific molecular targets of adaptive immune responses - also called `epitopes' - in tumor
cells. Like many other curated resources in the biomedical domain, CEDAR tracks the provenance of the
statements it captures by identifying the journal article (which is straightforward by its PubMed record) and the
specific part of the article where the curated data is contained. We here propose to make the data location field
in CEDAR accessible for AI/ML approaches, that will allow the comparison of information extracted by algorithms
from a free text article to the curated data in CEDAR. Specifically, we will: 1) Standardize how `data location' in
a manuscript is captured in CEDAR. 2) Programmatically link data locations to parts of journal articles in PubMed
Central (PMC). Completing these Aims will make the CEDAR data more valuable for large language models
(LLM) and related AI applications. Moreover, the approaches and code developed will be applicable to the many
other biomedical knowledgebases that curate data from the literature and capture its specific location.

## Key facts

- **NIH application ID:** 10842172
- **Project number:** 3U24CA248138-03S1
- **Recipient organization:** LA JOLLA INSTITUTE FOR IMMUNOLOGY
- **Principal Investigator:** Bjoern Peters
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $366,000
- **Award type:** 3
- **Project period:** 2023-09-18 → 2026-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10842172

## Citation

> US National Institutes of Health, RePORTER application 10842172, THE CANCER EPITOPE DATABASE AND ANALYSIS RESOURCE (3U24CA248138-03S1). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10842172. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
