# Evidence Extraction Systems for the Molecular Interaction Literature

> **NIH NIH R01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2020 · $264,232

## Abstract

Burns, Gully A.
Abstract
 In primary research articles, scientists make claims based on evidence from experiments, and report both
the claims and the supporting evidence in the results section of papers. However, biomedical databases de-
scribe the claims made by scientists in detail, but rarely provide descriptions of any supporting evidence that a
consulting scientist could use to understand why the claims are being made. Currently, the process of curating
evidence into databases is manual, time-consuming and expensive; thus, evidence is recorded in papers but not
generally captured in database systems. For example, the European Bioinformatics Institute's INTACT database
describes how different molecules biochemically interact with each other in detail. They characterize the under-
lying experiment providing the evidence of that interaction with only two hierarchical variables: a code denoting
the method used to detect the molecular interaction and another code denoting the method used to detect each
molecule. In fact, INTACT describes 94 different types of interaction detection method that could be used in
conjunction with other experimental methodological processes that can be used in a variety of different ways to
reveal different details about the interaction. This crucial information is not being captured in databases. Although
experimental evidence is complex, it conforms to certain principles of experimental design: experimentally study-
ing a phenomenon typically involves measuring well-chosen dependent variables whilst altering the values of
equally well-chosen independent variables. Exploiting these principles has permitted us to devise a preliminary,
robust, general-purpose representation for experimental evidence. In this project, We will use this representation
to describe the methods and data pertaining to evidence underpinning the interpretive assertions about molecular
interactions described by INTACT. A key contribution of our project is that we will develop methods to extract this
evidence from scientiﬁc papers automatically (A) by using image processing on a speciﬁc subtype of ﬁgure that is
common in molecular biology papers and (B) by using natural language processing to read information from the
text used by scientists to describe their results. We will develop these tools for the INTACT repository but package
them so that they may then also be used for evidence pertaining to other areas of research in biomedicine.

## Key facts

- **NIH application ID:** 9983144
- **Project number:** 5R01LM012592-04
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Nanyun Violet Peng
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $264,232
- **Award type:** 5
- **Project period:** 2017-09-01 → 2022-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9983144

## Citation

> US National Institutes of Health, RePORTER application 9983144, Evidence Extraction Systems for the Molecular Interaction Literature (5R01LM012592-04). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/9983144. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
