# Fairness in Practice: Defining and Implementing Diversity & Representation in AI datasets for Healthcare

> **NIH NIH R01** · STANFORD UNIVERSITY · 2024 · $499,530

## Abstract

There is widespread agreement that fairness—ensuring equitable performance across similarly situated
individuals and across groups—is a fundamental principle for ethical development of artificial intelligence for
health care (AI-HC). In the U.S., the NIH has launched initiatives to address unfairness due to lack of diversity
in datasets, such as the All of Us Research Program and the Human Pangenome Reference Consortium.
Other efforts focus on diversity and representation among researchers. However, it is not yet clear whether
these initiatives are sufficient to achieve fairness in AI-HC. Concepts and practices associated with diversity
and representation shape the extent to which AI-HC researchers and developers achieve goals for fairness.
Yet insufficient understanding of how diversity and representation are conceptualized and put into practice
within AI-HC projects is an impediment to achieving fairness in AI-HC datasets. Prior studies of biomedical
research indicate that scientists often hold differing concepts of diversity (e.g. genetic or other biomarkers, self-
reported race/ethnicity) and representation (e.g., the inclusion of specific historically underrepresented groups
in datasets vs. addressing how structural inequalities impact the data for underrepresented groups). In turn,
these concepts shape implementation of the practices used to achieve diversity and representation in datasets,
such as diversification of researchers, research participants, or technical solutions to bias. We propose to
assess how diversity and representation are conceptualized and put into practice in 50 NIH-funded AI-HC
research projects. Our analysis will be guided by Steven Epstein’s “inclusion-and-difference paradigm” as a
conceptual framework. We will employ a “microethics” perspective, which focuses analysis on how high-level
ethical goals, like fairness, are understood and put into practice in technical fields. This perspective allows
examination of how different actors (e.g. data scientists, clinicians, annotators) involved in AI dataset
development perceive three sets of issues: how diversity and representation are defined; trade-offs in scientific
and diversity goals; and the downstream impact on fairness. Informed by these findings, we will develop
evidence-informed practical guidance to support the future creation of fair datasets in AI-HC. Our aims are to:
Aim 1: Assess data scientists’ concepts and practices relevant to diversity and representation in
creating datasets for AI-HC will be achieved through a review of policy and guidance documents (Aim 1a)
and interviews of approximately 100 investigators from 50 systematically sampled NIH-funded AI-HC research
projects (Aim 1b). Aim 2: Formulate and disseminate practical, evidence-informed guidance to support
fairness in AI-HC development will be achieved through a modified Delphi process engaging experts in areas
relevant to fairness, diversity and representation in AI-HC. We will use a multi-pronge...

## Key facts

- **NIH application ID:** 10986273
- **Project number:** 1R01HG014227-01A1
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** Mildred K. Cho
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $499,530
- **Award type:** 1
- **Project period:** 2024-09-01 → 2027-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10986273

## Citation

> US National Institutes of Health, RePORTER application 10986273, Fairness in Practice: Defining and Implementing Diversity & Representation in AI datasets for Healthcare (1R01HG014227-01A1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10986273. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*