# Supplemental Training in Making Data FAIR and AI/ML Ready

> **NIH NIH T32** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2021 · $86,369

## Abstract

PROJECT SUMMARY
In 2018, the NIH Strategic Plan for Data Science identified a number of goals and cross-cutting themes to
address in order to maximize the value of data generated through NIH-funded efforts. This included the
enhancement of data sharing, access, and interoperability of NIH-supported data resources. One key barrier to
achieving this goal is the lack of biomedical scientists with the ability to apply data science techniques to
maximize the usability of the data and metadata produced by their research. Our NIEHS-supported training
grant (T32 ES007322) provides a single, unified training program for 18 predoctoral students and 8
postdoctoral fellows within the environmental health sciences. Our program is designed to ensure trainees
acquire skills in advanced data analytics to complement their primary training in environmental epidemiology,
climate science, molecular mechanisms of disease, and the exposome. The integration of additional training in
making diverse epidemiologic, toxicological and clinical data findable, accessible, interoperable and reusable
(FAIR) and ready for use with artificial intelligence and machine learning (AI/ML) is a natural progression for
our multi-disciplinary training program. We also benefit from the co-location of two other NIH-funded training
grants, in nursing informatics and neuroscience, with activities training biomedical researchers in data science.
We aim to leverage our collective expertise to develop a multidisciplinary curriculum that enables our trainees
to develop the competencies and skills needed to make diverse biomedical data FAIR and AI/ML-ready. This
curriculum will be designed to be flexible and module-based so it can be implemented in-full, as part of existing
training seminars, or as stand-alone bootcamps, depending upon the needs of individual training programs.
Our novel curriculum will combine didactic seminars, guided discussions and hands-on training activities to
develop competencies and skills in use of data standards, the FAIR principles and AI/ML-readiness. This
module-based curriculum will be centered on core foundational concepts, such as ontologies, common data
elements and metadata annotation. To construct these modules, we will draw upon expertise from faculty both
internal and external to Columbia University from within the fields of semantic science, information science,
environmental health data science, and computer science. We will consult with educational professionals who
will advise on evidence-based curricular design and provide independent evaluation of our curriculum and
training activities using both quantitative and qualitative measures. Following successful evaluation, we
propose to incorporate the developed curriculum and training activities into multiple existing training programs.
Recorded lectures, discussion guides and training materials will be made available within a shared resource
library. Formalizing supplementary training in the FAIR principle...

## Key facts

- **NIH application ID:** 10406009
- **Project number:** 3T32ES007322-20S1
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Pam R Factor-Litvak
- **Activity code:** T32 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $86,369
- **Award type:** 3
- **Project period:** 2000-07-01 → 2025-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10406009

## Citation

> US National Institutes of Health, RePORTER application 10406009, Supplemental Training in Making Data FAIR and AI/ML Ready (3T32ES007322-20S1). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10406009. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
