# An AI/ML-ready Dataset for Investigating the Effect of Variations in CT Acquisition and Reconstruction

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA LOS ANGELES · 2023 · $288,554

## Abstract

Quantitative image features (QIFs) such as radiomic and deep features hold enormous potential to improve the
detection, diagnosis, and treatment assessment of various diseases. When extracting QIFs from computed
tomography (CT) scans, computed values can vary based on differences in CT acquisition and reconstruction
parameters, including radiation dose level, slice thickness, reconstruction kernel, and reconstruction method.
The performance of artificial intelligence (AI) and machine learning (ML) models depends on the diversity of data
on which the model was trained. Previous studies have shown the negative impact that differences in CT
acquisition and reconstruction have on the reproducibility of radiomic feature values and the performance of
AI/ML models. However, there is a dearth of real-world datasets that enable AI/ML developers and researchers
can easily leverage to train and validate models that are robust to these differences. The objective of this
supplement is to improve the AI/ML-readiness of real-world patient CT datasets, facilitating investigations into
characterizing and mitigating the effect of variations in CT acquisition and reconstruction parameters. This project
builds upon our parent R01 project (R01 EB031993, Computational Toolkit for Normalizing the Impact of CT
Acquisition and Reconstruction on Quantitative Image Features), which aims to understand the effect of these
variations on downstream AI/ML models and clinical tasks (e.g., nodule detection, stroke characterization) and
develop effective methods for image harmonization. This project will bring together expertise in informatics,
medical physics, and data/model sharing standards. In Aim 1, we will release an AI/ML-ready CT dataset of 200
chest CT scans of patients who underwent lung cancer screening and 100 non-contrast head CTs of patients
with suspected stroke. Each scan will be reconstructed by varying dose, slice thickness, and kernel, resulting in
over 30 different versions of the same scan. Scans will also be annotated (e.g., outlined nodule boundaries) and
linked with clinical information (e.g., nodule characteristics, pathology-confirmed lung cancer diagnosis).
Following FAIR principles, clinical data, scans, and annotations will be released using established common data
elements and standards such as DICOM segmentation objects. In Aim 2, we will demonstrate the utility of this
dataset as a benchmark for assessing the reliability and robustness of AI/ML algorithms. We will use the
benchmark CT dataset to evaluate the performance of publicly available algorithms for lung nodule detection
and characterization and ischemic volume estimation. We will assess the robustness of these algorithms’
performance using metrics such as sensitivity and false positives/scan (nodule detection), area under the
receiver operating characteristic curve (nodule classification), and mean absolute error (stroke quantification)
across different scans. Successful completion of this p...

## Key facts

- **NIH application ID:** 10842635
- **Project number:** 3R01EB031993-02S1
- **Recipient organization:** UNIVERSITY OF CALIFORNIA LOS ANGELES
- **Principal Investigator:** William Hsu
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $288,554
- **Award type:** 3
- **Project period:** 2022-09-01 → 2025-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10842635

## Citation

> US National Institutes of Health, RePORTER application 10842635, An AI/ML-ready Dataset for Investigating the Effect of Variations in CT Acquisition and Reconstruction (3R01EB031993-02S1). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/10842635. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*