# Semi-Automating Data Extraction for Systematic Reviews

> **NIH NIH R01** · NORTHEASTERN UNIVERSITY · 2020 · $291,873

## Abstract

Summary ​Semi-Automating Data Extraction for Systematic Reviews (​Renewal)
Evidence-based Medicine (EBM) aims to inform patient care using all available evidence.
Realizing this aim in practice would require access to concise, comprehensive, and up-to-date
structured summaries of the evidence relevant to a particular clinical question. Systematic
reviews of biomedical literature aim to provide such summaries, and are a critical component of
the EBM arsenal and modern medicine more generally. However, such reviews are extremely
laborious to conduct. Furthermore, owing to the rapid expansion of the biomedical literature
base, they tend to go out of date quickly as new evidence emerges. These factors hinder the
practice of evidence-based care.
In this renewal proposal, we seek to continue our ground-breaking efforts on developing,
evaluating, and deploying novel machine learning (ML) and natural language processing (NLP)
methods to automate or semi-automate the evidence synthesis process. This will extend our
innovative and successful efforts developing RobotReviewer and related technologies under the
current grant. Concretely, for this renewal we propose to move from extraction of clinically
salient data elements from individual trials to synthesis of these elements across trials. Our first
aim is to extend our ML and NLP models to produce (as one deliverable) a publicly available,
continuously and automatically updated semi-structured evidence database, comprising
extracted data for all evidence, both published and unpublished. Unpublished trials will be
identified via trial registries.
Taking this up-to-date evidence repository as a starting point, we then propose cutting-edge ML
and NLP models that will generate first drafts of evidence syntheses, automatically. More
specifically we propose novel neural cross-document summarization models that will capitalize
on the semi-structured information automatically extracted by our existing models, in addition
to article texts. These models will be deployed in a new version of RobotReviewer, called
RobotReviewerLive, intended to be a prototype for “living” systematic reviews. To rigorously
evaluate the practical utility of the proposed methodological innovations, we will pilot their use
to support real, ongoing, exemplar living reviews.

## Key facts

- **NIH application ID:** 9990898
- **Project number:** 5R01LM012086-06
- **Recipient organization:** NORTHEASTERN UNIVERSITY
- **Principal Investigator:** Iain Marshall
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $291,873
- **Award type:** 5
- **Project period:** 2015-09-20 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9990898

## Citation

> US National Institutes of Health, RePORTER application 9990898, Semi-Automating Data Extraction for Systematic Reviews (5R01LM012086-06). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9990898. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
