THE CANCER EPITOPE DATABASE AND ANALYSIS RESOURCE

NIH RePORTER · NIH · U24 · $366,000 · view on reporter.nih.gov ↗

Abstract

Project Summary The primary goal of our proposal is to improve the AI/ML-readiness of data in the Cancer Epitope Database and Analysis Resource (CEDAR). CEDAR provides a catalog of manually curated information from journal articles that describe the specific molecular targets of adaptive immune responses - also called `epitopes' - in tumor cells. Like many other curated resources in the biomedical domain, CEDAR tracks the provenance of the statements it captures by identifying the journal article (which is straightforward by its PubMed record) and the specific part of the article where the curated data is contained. We here propose to make the data location field in CEDAR accessible for AI/ML approaches, that will allow the comparison of information extracted by algorithms from a free text article to the curated data in CEDAR. Specifically, we will: 1) Standardize how `data location' in a manuscript is captured in CEDAR. 2) Programmatically link data locations to parts of journal articles in PubMed Central (PMC). Completing these Aims will make the CEDAR data more valuable for large language models (LLM) and related AI applications. Moreover, the approaches and code developed will be applicable to the many other biomedical knowledgebases that curate data from the literature and capture its specific location.

Key facts

NIH application ID: 10842172
Project number: 3U24CA248138-03S1
Recipient: LA JOLLA INSTITUTE FOR IMMUNOLOGY
Principal Investigator: Bjoern Peters
Activity code: U24
Funding institute: NIH
Fiscal year: 2023
Award amount: $366,000
Award type: 3
Project period: 2023-09-18 → 2026-04-30