# Cancer Prevention and Control (CAPAC) Research Training Program

> **NIH NIH R25** · COMPREHENSIVE CANCER CENTER/ UNIV/PR · 2021 · $83,874

## Abstract

Summary
Cancer is the leading cause of death among Hispanics, the largest racial/ethnic minority group in the United
States and disproportionately affected by cancer health disparities. Despite this disparity, cancer datasets,
specifically for Hispanic populations, are not as available as for other ethnicities. Given the need for cancer
health disparities research with a focus on Hispanic health, there is a need for applying Artificial
Intelligence/Machine Learning (AI/ML) approaches in this field, and an urgency on making Hispanics cancer
datasets Findable, Accessible, Interoperable, and Reusable (FAIR) and AI/ML -ready. The Cancer Prevention
and Control Research (CAPAC) Training Program of the University of Puerto Rico Comprehensive Cancer
Center (UPRCCC), recruits graduate and health professions students for a hands-on summer research
experience in PR. This supplement aims to expand the scope of the parent CAPAC training Program and prepare
research workforce on 1) the techniques and approaches to manipulate and pre-process Hispanics cancer
datasets to make them FAIR and AI/ML ready, and on 2) the available methods for developing ML-based models
to analyze these data and create predictive models for cancer diagnosis and treatments with a focus on Hispanic
datasets. We will develop an online course based on the data science project lifecycle, which includes four
phases: 1) Data Understanding/ Data Pre-processing; 2) Data Wrangling; 3) Model Planning; and 4) Model
Building. This 24-hour online asynchronous course will be organized in modules within two components.
Component 1 will include the following topics: fundamentals of cancer data types, identifying and understanding
cancer datasets, data science concepts and project lifecycles; basic programming concepts; programming with
Python; exploring, pre-processing, and conditioning the cancer datasets; performing Extract, Transform, Load
(ETL) prior to AI/ML modelling. Component 2 will add topics such as: principles of AI/ML; variable correlations
and associations; determining datasets for training and testing; supervised and unsupervised ML approaches;
classification, regression and ensembles ML-algorithms; familiarizing with ML tools. To develop our course,
examples and projects, we will use Hispanics cancer datasets from the US and PR. The course would
be voluntary and free for interested participants (capacity of 40 trainees), including CAPAC participants (alumni)
and applicants, CAPAC mentors, as well as trainees and research staff from collaborating grants/institutions.
Student’s gained skills will be evaluated with quizzes and a final practical project, while the course will be
evaluated with the support of the evaluation component of the parent grant. This supplement will impact the
development of human resources (e.g. students, researchers, clinicians) from the United States and Puerto Rico
with the competencies and skills needed to make FAIR Hispanics cancer datasets and to apply AI/ML
...

## Key facts

- **NIH application ID:** 10405752
- **Project number:** 3R25CA240120-03S1
- **Recipient organization:** COMPREHENSIVE CANCER CENTER/ UNIV/PR
- **Principal Investigator:** ANA Patricia ORTIZ
- **Activity code:** R25 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $83,874
- **Award type:** 3
- **Project period:** 2019-09-01 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10405752

## Citation

> US National Institutes of Health, RePORTER application 10405752, Cancer Prevention and Control (CAPAC) Research Training Program (3R25CA240120-03S1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10405752. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
