Improving AI/ML-Readiness of data generated from NIH-funded research on oral cancer screening

NIH RePORTER · NIH · R01 · $282,307 · view on reporter.nih.gov ↗

Abstract

Oral and oropharyngeal squamous cell carcinoma (OSCC) together rank as the sixth most common cancer worldwide, accounting for 400,000 new cancer cases each year. Two-thirds of these cancers occur in low- and middle-income countries (LMICs). While the 5-year survival rate in the U.S. is 62%, the survival rate is only 10- 40% and cure rate around 30% in the developing world. To meet the need for technologies that enable comprehensive oral cancer screening and diagnosis in low resource settings (LRS). In the parent R01DE030682 project titled “Multimodal Intraoral Imaging System for Oral Cancer Detection and Diagnosis in Low Resource Setting”, we have formed an interdisciplinary team with complementary expertise in optical imaging, oncology, deep learning, technology translation, and commercialization to develop, validate, and clinically translate a multimodal intraoral imaging system for oral cancer detection and diagnosis. We will achieve the project objective through three Aims: (1) develop a portable, semi-flexible, and compact multimodal intraoral imaging system; (2) evaluate the clinical feasibility of the prototyped intraoral imaging system and develop deep learning based image processing algorithms for early detection, diagnosis, and mapping of oral dysplastic and malignant lesions; and (3) validate the capability of the prototyped intraoral imaging system for diagnosing oral dysplasia and malignant lesions. In our UH3CA239682 project titled “Low-cost Mobile Oral Cancer Screening for Low Resource Setting”, we have screened ~7,000 high-risk population for oral cancer and obtained at least two pairs of dual-modal images (white light and autofluorescence) from each patient and obtained more than 28,000 de-identified images and related information. It is the largest image dataset on oral cancers. With this Administrative Supplements, we will make the image data AI/ML-ready by improving data compatibility with AI/ML tools, cleaning dataset, balancing data, reducing uncertainty, improving the interoperability of the data with ontology, and improving trustworthiness of AI/ML models using pixel-level annotation. We will also demonstrate the use of the transformed data in AI/ML applications through (1) multi-class oral cancer classification using the transformed multi-modal data and (2) interpretable and trustworthy AI model using image-level labels and pixel-level annotation. The image data and machine learning models will be available through The University of Arizona Research Data Repository (ReDATA). Completion of this project will accelerate development of AI/ML-based techniques for early oral cancer detection in low-resource settings, reducing morbidity and mortality. It will make data FAIR (Findable, Accessible, Interoperable, and Reusable) with high impact for open science, contributing to the NIH vision of a modernized and integrated biomedical data ecosystem. The parent R01 project will directly benefit from this dataset and the developed ...

Key facts

NIH application ID
10594120
Project number
3R01DE030682-02S1
Recipient
UNIVERSITY OF ARIZONA
Principal Investigator
Rongguang Liang
Activity code
R01
Funding institute
NIH
Fiscal year
2022
Award amount
$282,307
Award type
3
Project period
2021-08-10 → 2024-07-31