# Statistical methods for correlated outcome and covariate errors in studies of HIV/AIDS

> **NIH NIH R01** · VANDERBILT UNIVERSITY MEDICAL CENTER · 2020 · $534,133

## Abstract

Abstract (Shepherd and Shaw, R01, Statistical methods for correlated outcome and covariate errors in studies
of HIV/AIDS)
There is growing interest in using administrative electronic health record (EHR) data and other routinely
collected data sources as cost-effective means to support HIV/AIDS research. Validation of observational
cohort and EHR data demonstrate the substantial presence of errors in these types of data. There may be
errors in failure and censoring times (e.g., time from ART initiation to clinical events), event classifications, and
covariates (e.g., CD4 at ART initiation), with strong correlation between the magnitudes of errors in these
variables. These correlated errors can bias estimation. Ideally, researchers could validate a subsample of their
data and use information learned from this subsample to improve estimation for the entire cohort, thereby
obtaining valid estimates without validating the entire database. However, the current lack of available methods
and software to correct for these types of errors for time-to-event outcomes are major barriers to performing
correct inference on these types of data. There is also little guidance on what records and variables to validate
to optimize resources. This project will create novel statistical methods for estimation to reduce or eliminate
bias caused by correlated errors in failure-time outcomes and associated covariates. The developed methods
will use information on the structure of the measurement error, gained by data validation or audit subsets, to
adjust estimation and correct for errors that remain in the unvalidated data. The project will develop and
examine extensions of regression calibration, corrected scores, and multiple imputation methods, augmented
with raking techniques to address these correlated errors. The project will also develop efficient data validation
and audit sampling designs that use adaptive, multi-wave sampling in order to target successive validation and
audit subsets towards informative subgroups of patients. Open source tools will be developed to allow
researchers to implement these methods and study designs. The methods and designs will be applied to data
from the International Epidemiologic Databases to Evaluate AIDS (IeDEA) to estimate the incidence of
tuberculosis and Kaposi's sarcoma and their outcomes, risk factor associations, and temporal trends among
persons living with HIV in East Africa and Latin America.

## Key facts

- **NIH application ID:** 9861024
- **Project number:** 5R01AI131771-03
- **Recipient organization:** VANDERBILT UNIVERSITY MEDICAL CENTER
- **Principal Investigator:** Pamela A Shaw
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $534,133
- **Award type:** 5
- **Project period:** 2019-02-04 → 2023-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9861024

## Citation

> US National Institutes of Health, RePORTER application 9861024, Statistical methods for correlated outcome and covariate errors in studies of HIV/AIDS (5R01AI131771-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9861024. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
