# Center for Collaborative Research in Minority Health and Health Disparities

> **NIH NIH U54** · UNIVERSITY OF PUERTO RICO MED SCIENCES · 2021 · $353,667

## Abstract

Hispanics are one of the largest racial/ethnic minority groups in the United States and are
disproportionately affected by health issues. Hispanic population datasets are not as available as
for other ethnicities. Data Science (DS) is an interdisciplinary field that aims to extract knowledge
and insights from structured and unstructured data. Artificial intelligence (AI) is an area of
computer science, which considers building smart machines capable of "thinking". Machine
Learning (ML) refers to analytical algorithms that iteratively learn from data. Thus, given the need
for health disparities research with a focus on Hispanic health, there is an urgency on strengthen
and enhancing the diversity of the NIH-funded workforce by utilizing DS for making Hispanics
datasets Findable, Accessible, Interoperable, and Reusable (FAIR), and applying AI/ML
approaches in this field to extract knowledge from Hispanic dataset to mitigate health disparities.
In response to Notice of Special Interest (NOSI) "Administrative Supplements to Enhance
Data Science Capacity at NIMHD-Funded Research Centers in Minority Institutions
(RCMI)", we aim to enhance and build capacity for investigators and students in DS/AI/ ML
topics such as Jupyter Hub, coding with R, RStudio and Python, using ML libraries and
other cutting-edge techniques to address and mitigate Hispanic health disparities. We will
develop a new course, "Applying Artificial Intelligence and Machine Learning to Health
Disparities Research (AIML+HDR), focused on data analysis using Hispanic datasets, that
represents multiple levels and domains of influence in the NIMHD Research Framework. This
bilingual (Spanish and English) online asynchronous course will initially target mainly trainees at
the University of Puerto Rico and other Hispanic institutions. The organization of the AIML+HDR
will follow the data science project lifecycle and will be divided in two modules. Module I will focus
on using DS to make the Hispanic datasets FAIR and include different modules for data
understanding and data wrangling. Module II will add topics for creating AI/ML predictive models
for diagnostic and treatments of Hispanic patients, including modules related to model planning
and building phases. To develop examples and projects, we will use Hispanic datasets from public
repositories such as the Surveillance Epidemiology and End Results Program (SEER) and All of
Us project, and from private repositories such as AbartysHealth. This novel training course will
diversity the NIH-funded data science workforce through the development of competencies and
skills in Hispanic investigators and graduate students, who will then generate Hispanic datasets
that are FAIR and apply AI/ML approaches to create relevant predictive models.

## Key facts

- **NIH application ID:** 10435854
- **Project number:** 3U54MD007600-35S2
- **Recipient organization:** UNIVERSITY OF PUERTO RICO MED SCIENCES
- **Principal Investigator:** Emma Fernandez-Repollet
- **Activity code:** U54 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $353,667
- **Award type:** 3
- **Project period:** 1997-09-01 → 2022-09-19

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10435854

## Citation

> US National Institutes of Health, RePORTER application 10435854, Center for Collaborative Research in Minority Health and Health Disparities (3U54MD007600-35S2). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/10435854. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
