Semi-Automating Data Extraction for Systematic Reviews

NIH RePORTER · NIH · R01 · $291,873 · view on reporter.nih.gov ↗

Abstract

Summary Semi-Automating Data Extraction for Systematic Reviews (Renewal) Evidence-based Medicine (EBM) aims to inform patient care using all available evidence. Realizing this aim in practice would require access to concise, comprehensive, and up-to-date structured summaries of the evidence relevant to a particular clinical question. Systematic reviews of biomedical literature aim to provide such summaries, and are a critical component of the EBM arsenal and modern medicine more generally. However, such reviews are extremely laborious to conduct. Furthermore, owing to the rapid expansion of the biomedical literature base, they tend to go out of date quickly as new evidence emerges. These factors hinder the practice of evidence-based care. In this renewal proposal, we seek to continue our ground-breaking efforts on developing, evaluating, and deploying novel machine learning (ML) and natural language processing (NLP) methods to automate or semi-automate the evidence synthesis process. This will extend our innovative and successful efforts developing RobotReviewer and related technologies under the current grant. Concretely, for this renewal we propose to move from extraction of clinically salient data elements from individual trials to synthesis of these elements across trials. Our first aim is to extend our ML and NLP models to produce (as one deliverable) a publicly available, continuously and automatically updated semi-structured evidence database, comprising extracted data for all evidence, both published and unpublished. Unpublished trials will be identified via trial registries. Taking this up-to-date evidence repository as a starting point, we then propose cutting-edge ML and NLP models that will generate first drafts of evidence syntheses, automatically. More specifically we propose novel neural cross-document summarization models that will capitalize on the semi-structured information automatically extracted by our existing models, in addition to article texts. These models will be deployed in a new version of RobotReviewer, called RobotReviewerLive, intended to be a prototype for “living” systematic reviews. To rigorously evaluate the practical utility of the proposed methodological innovations, we will pilot their use to support real, ongoing, exemplar living reviews.

Key facts

NIH application ID: 9990898
Project number: 5R01LM012086-06
Recipient: NORTHEASTERN UNIVERSITY
Principal Investigator: Iain Marshall
Activity code: R01
Funding institute: NIH
Fiscal year: 2020
Award amount: $291,873
Award type: 5
Project period: 2015-09-20 → 2023-06-30