# Identifying and understanding drivers of selection bias and information bias in clinical COVID-19 data

> **NIH NIH R21** · OREGON HEALTH & SCIENCE UNIVERSITY · 2021 · $200,970

## Abstract

Project Summary / Abstract
 During the COVID-19 pandemic, there is an immediate need for high-quality data for studies that
support patient care, predict outcomes, identify and evaluate treatments, allocate resources, and make
operations and policy decisions. While prospective research produces higher-quality evidence, retrospective
studies that reuse clinical data can be executed in a shorter time frame and for less cost, both of which are
crucial for research in a pandemic. Unfortunately, it has been shown that the usefulness and validity of
available COVID-19 data are constrained by various forms of selection bias and information bias, which may
lead to non-valid findings in research and analytics and disparities in resulting healthcare practices.
 The objective of the proposed work is to study the selection and information biases present in clinically
derived COVID-19 datasets by integrating COVID-19 datasets from OHSU and the National COVID Cohort
Collaborative with novel and traditional sources of clinical, epidemiological, social media, and citizen-generated
data. From each data source we will extract data indicating COVID-19, as well as a set of social determinants
of health that are commonly associated with healthcare utilization and access. To test for the presence of
selection bias, we will construct and compare categorical probability distributions for each social determinant
across COVID-19 cases in each data source. Differences in these distributions will indicate selection bias in
one or more of the data sources. Next we will determine information bias by extending and adapting tests for
missingness and other forms of information bias in the COVID-19 datasets to determine if the quantity and
quality of these data vary with respect to clinical factors and those related to social determinants of health.
 This proposal therefore addresses a significant gap in knowledge: understanding not just the disparities
in who is impacted by COVID-19, but who is represented by the data we have available for learning more
about the disease. The identification and estimation the influence of social determinants of health on selection
bias and information bias in COVID-19 data can guide the use of statistical and analytic approaches that can
improve the external and internal validity of research and analytics that rely on these data, including estimates
of disease prevalence, understanding the natural course of COVID-19, and identifying patients who are at risk
for severe disease.

## Key facts

- **NIH application ID:** 10192372
- **Project number:** 1R21LM013645-01
- **Recipient organization:** OREGON HEALTH & SCIENCE UNIVERSITY
- **Principal Investigator:** Nicole Gray Weiskopf
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $200,970
- **Award type:** 1
- **Project period:** 2021-04-01 → 2023-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10192372

## Citation

> US National Institutes of Health, RePORTER application 10192372, Identifying and understanding drivers of selection bias and information bias in clinical COVID-19 data (1R21LM013645-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10192372. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
