Fine-Grained Spatial Information Extraction For Radiology Reports

NIH RePORTER · NIH · R21 · $269,607 · view on reporter.nih.gov ↗

Abstract

ABSTRACT Automated medical image classification has seen enormous performance improvements recently, particularly in radiology. The application of these approaches to Alzheimer's Disease (AD), however, has been limited due to relatively small datasets and the limited granularity of their corresponding phenotypes. The dataset size issue is problematic as the machine learning (ML) methods that have achieved such remarkable performance often require enormous amounts of labeled data for training. Furthermore, the phenotype granularity issue impedes the targeted studying of AD along the lines of what is seen in the “precision medicine” approaches to diseases such as cancer. Solutions exist, however, as an increasingly accepted means of acquiring large amounts of labeled data is through the use of natural language processing (NLP) on the free-text reports associated with an image If a radiology report describes a patient's AD-related finding, the associated image(s) can be used to train an image classifier. The parent project to this supplemental proposal (R21EB029575) proposes just such a NLP method while simultaneously solving the granularity issue by extracting fine-grained spatial information from the report. In the parent project, we are developing NLP resources and methods to improve the automated labeling of radiology images using the corresponding study reports. The parent is not specific to AD (or any disease), so this supplement will enable us to focus on this particularly important disease, which will benefit significantly from improved ML-based imaging. We will focus on MRI and PET scans. The Aims here parallel the parent project, each focusing on methods that specifically improve NLP for AD radiological indicator extraction as well as the validation of image classification from the corresponding labels. These Aims include (1) extending the spatial representation and corpus for Alzheimer's, (2) extending the NLP methods for automatic extraction, and (3) validating the AD-related labels for use in image classification. The long-term impact of this project is to substantially improve AD diagnosis by scaling up the amount of labeled data available to ML-based classifiers. The short-term goal supplement is to focus our NLP/Imaging combination research on the complex task of improving AD diagnosis. By extending our project with a specific target for AD, we will initiate a sizable research effort toward this goal.

Key facts

NIH application ID: 10288320
Project number: 3R21EB029575-02S1
Recipient: UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
Principal Investigator: Kirk Edward Roberts
Activity code: R21
Funding institute: NIH
Fiscal year: 2021
Award amount: $269,607
Award type: 3
Project period: 2020-03-01 → 2022-12-31