Cancer Prevention and Control (CAPAC) Research Training Program

NIH RePORTER · NIH · R25 · $83,874 · view on reporter.nih.gov ↗

Abstract

Summary Cancer is the leading cause of death among Hispanics, the largest racial/ethnic minority group in the United States and disproportionately affected by cancer health disparities. Despite this disparity, cancer datasets, specifically for Hispanic populations, are not as available as for other ethnicities. Given the need for cancer health disparities research with a focus on Hispanic health, there is a need for applying Artificial Intelligence/Machine Learning (AI/ML) approaches in this field, and an urgency on making Hispanics cancer datasets Findable, Accessible, Interoperable, and Reusable (FAIR) and AI/ML -ready. The Cancer Prevention and Control Research (CAPAC) Training Program of the University of Puerto Rico Comprehensive Cancer Center (UPRCCC), recruits graduate and health professions students for a hands-on summer research experience in PR. This supplement aims to expand the scope of the parent CAPAC training Program and prepare research workforce on 1) the techniques and approaches to manipulate and pre-process Hispanics cancer datasets to make them FAIR and AI/ML ready, and on 2) the available methods for developing ML-based models to analyze these data and create predictive models for cancer diagnosis and treatments with a focus on Hispanic datasets. We will develop an online course based on the data science project lifecycle, which includes four phases: 1) Data Understanding/ Data Pre-processing; 2) Data Wrangling; 3) Model Planning; and 4) Model Building. This 24-hour online asynchronous course will be organized in modules within two components. Component 1 will include the following topics: fundamentals of cancer data types, identifying and understanding cancer datasets, data science concepts and project lifecycles; basic programming concepts; programming with Python; exploring, pre-processing, and conditioning the cancer datasets; performing Extract, Transform, Load (ETL) prior to AI/ML modelling. Component 2 will add topics such as: principles of AI/ML; variable correlations and associations; determining datasets for training and testing; supervised and unsupervised ML approaches; classification, regression and ensembles ML-algorithms; familiarizing with ML tools. To develop our course, examples and projects, we will use Hispanics cancer datasets from the US and PR. The course would be voluntary and free for interested participants (capacity of 40 trainees), including CAPAC participants (alumni) and applicants, CAPAC mentors, as well as trainees and research staff from collaborating grants/institutions. Student’s gained skills will be evaluated with quizzes and a final practical project, while the course will be evaluated with the support of the evaluation component of the parent grant. This supplement will impact the development of human resources (e.g. students, researchers, clinicians) from the United States and Puerto Rico with the competencies and skills needed to make FAIR Hispanics cancer datasets and to apply AI/ML ...

Key facts

NIH application ID
10405752
Project number
3R25CA240120-03S1
Recipient
COMPREHENSIVE CANCER CENTER/ UNIV/PR
Principal Investigator
ANA Patricia ORTIZ
Activity code
R25
Funding institute
NIH
Fiscal year
2021
Award amount
$83,874
Award type
3
Project period
2019-09-01 → 2024-08-31