A structured multi-scale dataset with prostate MRI for AI/ML research

NIH RePORTER · NIH · R01 · $312,000 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Magnetic resonance imaging (MRI) can provide detailed anatomical and functional information of the prostate, but radiologists presently report on limited characteristics, often in a subjective and qualitative manner. Recent developments in artificial intelligence and machine learning (AI/ML) have demonstrated that AI/ML models can complement and overcome the obstacles of current qualitative MRI interpretation by learning important hierarchical features and subtle patterns that are predictive of clinically significant prostate cancer from the data. A challenge in AI/ML is providing an adequate number of validated annotations (e.g., contours of prostate cancer lesions, Gleason scores for each lesion) to ensure that "ground truth" labels are unbiased and biologically relevant. Presently, these ground truth labels are commonly obtained from histopathologically-confirmed findings, which can be from either biopsy or surgical specimens. However, these two histopathological findings are often discordant. Particularly, biopsy-based histopathology results are known to be biased and/or uncertain due to (1) interpretation variability among pathologists, (2) lesions with borderline grades, and (3) biopsy sampling error. This discrepancy directly impacts the training, validation, and generalization of AI/ML. To date, publicly available prostate MRI datasets exist in The Cancer Imaging Archive, but these datasets utilize biopsy-confirmed histopathology as ground truth labels. There is a need to address the potential uncertainty and bias in data labeling of the prostate MRI datasets. Building upon an active NIH R01 project (R01-CA248506) that is developing novel quantitative MRI and AI/ML methods to predict clinically significant prostate cancer with respect to surgical pathology, the objective of this project is to improve the AI/ML-readiness of the prostate MRI data by linking multiscale information across clinical, radiologic, and pathologic data from biopsy and surgery within the same cohort. Our research team will build an AI-ready multiscale dataset by integrating prostate MRI with different histopathology analyses within the same cohort. This will allow direct comparison and validation of different AI/ML models when different ground truth labels are used, providing potential ways to combine other publicly available AI/ML datasets. The investigative team was augmented with experts in MRI-ultrasound fusion biopsy and biomedical informatics to develop a multiscale dataset that is ready for training and validation of AI/ML algorithms. Successful completion of the proposed work will result in: (1) a unique dataset of consented subjects who underwent prostate MRI and both biopsies and prostatectomy; and (2) structured clinical, radiologic, and pathologic findings shared in a standardized manner with a clearly defined data dictionary. This augmented population and toolkit will enable further refinement and improvements in AI/ML models for image t...

Key facts

NIH application ID: 10593499
Project number: 3R01CA248506-03S1
Recipient: UNIVERSITY OF CALIFORNIA LOS ANGELES
Principal Investigator: Kyung Hyun Sung
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $312,000
Award type: 3
Project period: 2020-03-01 → 2025-02-28