# A structured multi-scale dataset with prostate MRI for AI/ML research

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA LOS ANGELES · 2022 · $312,000

## Abstract

PROJECT SUMMARY
Magnetic resonance imaging (MRI) can provide detailed anatomical and functional information of the prostate,
but radiologists presently report on limited characteristics, often in a subjective and qualitative manner. Recent
developments in artificial intelligence and machine learning (AI/ML) have demonstrated that AI/ML models can
complement and overcome the obstacles of current qualitative MRI interpretation by learning important
hierarchical features and subtle patterns that are predictive of clinically significant prostate cancer from the data.
A challenge in AI/ML is providing an adequate number of validated annotations (e.g., contours of prostate cancer
lesions, Gleason scores for each lesion) to ensure that "ground truth" labels are unbiased and biologically
relevant. Presently, these ground truth labels are commonly obtained from histopathologically-confirmed
findings, which can be from either biopsy or surgical specimens. However, these two histopathological findings
are often discordant. Particularly, biopsy-based histopathology results are known to be biased and/or uncertain
due to (1) interpretation variability among pathologists, (2) lesions with borderline grades, and (3) biopsy
sampling error. This discrepancy directly impacts the training, validation, and generalization of AI/ML.
To date, publicly available prostate MRI datasets exist in The Cancer Imaging Archive, but these datasets utilize
biopsy-confirmed histopathology as ground truth labels. There is a need to address the potential uncertainty and
bias in data labeling of the prostate MRI datasets. Building upon an active NIH R01 project (R01-CA248506) that
is developing novel quantitative MRI and AI/ML methods to predict clinically significant prostate cancer with
respect to surgical pathology, the objective of this project is to improve the AI/ML-readiness of the prostate MRI
data by linking multiscale information across clinical, radiologic, and pathologic data from biopsy and surgery
within the same cohort. Our research team will build an AI-ready multiscale dataset by integrating prostate MRI
with different histopathology analyses within the same cohort. This will allow direct comparison and validation of
different AI/ML models when different ground truth labels are used, providing potential ways to combine other
publicly available AI/ML datasets. The investigative team was augmented with experts in MRI-ultrasound fusion
biopsy and biomedical informatics to develop a multiscale dataset that is ready for training and validation of
AI/ML algorithms. Successful completion of the proposed work will result in: (1) a unique dataset of consented
subjects who underwent prostate MRI and both biopsies and prostatectomy; and (2) structured clinical,
radiologic, and pathologic findings shared in a standardized manner with a clearly defined data dictionary. This
augmented population and toolkit will enable further refinement and improvements in AI/ML models for image t...

## Key facts

- **NIH application ID:** 10593499
- **Project number:** 3R01CA248506-03S1
- **Recipient organization:** UNIVERSITY OF CALIFORNIA LOS ANGELES
- **Principal Investigator:** Kyung Hyun Sung
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $312,000
- **Award type:** 3
- **Project period:** 2020-03-01 → 2025-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10593499

## Citation

> US National Institutes of Health, RePORTER application 10593499, A structured multi-scale dataset with prostate MRI for AI/ML research (3R01CA248506-03S1). Retrieved via AI Analytics 2026-05-29 from https://api.ai-analytics.org/grant/nih/10593499. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
