# An ethical framework-guided metric tool for assessing bias in EHR-based Big Data studies

> **NIH NIH R01** · UNIVERSITY OF SOUTH CAROLINA AT COLUMBIA · 2022 · $267,578

## Abstract

Abstract
The emergence of Big Data health research has exponentially advanced the fields of medicine and public health
but has also faced many ethical challenges. One of most worrying but still under-researched aspects of ethical
issues is the risk of potential biases in datasets (e.g., electronic health records [EHR] data) as well as in the data
curation and acquisition cycles. Very few EHR data-based studies report bias in datasets, data acquisition
and/or mining as an indicator of research quality because of a lack of a standardized measurement tool or
metrics to assess bias; few ethical frameworks as a theoretical ground; and limited effective interdisciplinary
collaboration that engages ethical experts, professional data curators, data management experts, data
repository administrators, healthcare workers, and state agencies in discussions addressing this ethical
challenge. Since 2021, we have been funded by NIH (R01AI164947) to develop a machine-learning based
predictive model of viral suppression among HIV patients based on EHR and other relevant data from multiple
sources in South Carolina. One of the ethical challenges encountered by the parent project is how to assess the
potential biases in the curation, acquisition, and processing of EHR data. In response to the NOT-OD-22-065
titled “Administrative supplements for advancing the ethical development and use of AI/ML in biomedical and
behavioral sciences”, we propose to develop, refine, and pilot test an ethical framework-guided metric tool for
assessing bias in Big Data research using EHR datasets. Specifically, we request support to: 1) conduct a
literature/policy review and concept analysis to develop an ethical framework for unbiased and inclusive Big
Data research; 2) create and modify a metric tool to assess potential biases in EHR data-based studies via in-
depth interviews of key stakeholders of the parent project; and 3) refine and disseminate the metric tool
through a community charette workshop among interdisciplinary scholars (ethics experts and disciplinary
experts) and key stakeholders (data curators, data management experts, and data repository administrators;
healthcare workers; and HIV patients) and pilot test it in the parent project. The proposed study will advance
our understanding of bias and equity issues in Big Data research and develop an ethical framework and a
metric tool for assessing bias in EHR-based Big Data studies, thus leading to and informing a more nuanced
assessment and exploration of bias in practice for the ethical development of Big Data health research beyond
the parent project. The metric tool of bias for a Big Data study can be reused as an assessment tool to detect
and quantify biases, which may contribute to improving awareness and exploration of this critical ethical
challenge. The ethical framework regarding bias challenges in Big Data research may provide insights and
guidance for addressing bias issues in other types of Big Data beyond EHR.

## Key facts

- **NIH application ID:** 10599459
- **Project number:** 3R01AI164947-02S2
- **Recipient organization:** UNIVERSITY OF SOUTH CAROLINA AT COLUMBIA
- **Principal Investigator:** Bankole Olatosi
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $267,578
- **Award type:** 3
- **Project period:** 2021-06-09 → 2026-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10599459

## Citation

> US National Institutes of Health, RePORTER application 10599459, An ethical framework-guided metric tool for assessing bias in EHR-based Big Data studies (3R01AI164947-02S2). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10599459. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*