Statistical methods for correlated outcome and covariate errors in studies of HIV/AIDS

NIH RePORTER · NIH · R01 · $659,345 · view on reporter.nih.gov ↗

Abstract

Abstract (Shepherd and Shaw, R01, Statistical methods for correlated outcome and covariate errors in studies of HIV/AIDS) There is growing interest in using administrative electronic health record (EHR) data and other routinely collected data sources as cost-effective means to support HIV/AIDS research. Validation of observational cohort and EHR data demonstrate the substantial presence of errors in these types of data. There may be errors in failure and censoring times (e.g., time from ART initiation to clinical events), event classifications, and covariates (e.g., CD4 at ART initiation), with strong correlation between the magnitudes of errors in these variables. These correlated errors can bias estimation. Ideally, researchers could validate a subsample of their data and use information learned from this subsample to improve estimation for the entire cohort, thereby obtaining valid estimates without validating the entire database. However, the current lack of available methods and software to correct for these types of errors for time-to-event outcomes are major barriers to performing correct inference on these types of data. There is also little guidance on what records and variables to validate to optimize resources. This project will create novel statistical methods for estimation to reduce or eliminate bias caused by correlated errors in failure-time outcomes and associated covariates. The developed methods will use information on the structure of the measurement error, gained by data validation or audit subsets, to adjust estimation and correct for errors that remain in the unvalidated data. The project will develop and examine extensions of regression calibration, corrected scores, and multiple imputation methods, augmented with raking techniques to address these correlated errors. The project will also develop efficient data validation and audit sampling designs that use adaptive, multi-wave sampling in order to target successive validation and audit subsets towards informative subgroups of patients. Open source tools will be developed to allow researchers to implement these methods and study designs. The methods and designs will be applied to data from the International Epidemiologic Databases to Evaluate AIDS (IeDEA) to estimate the incidence of tuberculosis and Kaposi's sarcoma and their outcomes, risk factor associations, and temporal trends among persons living with HIV in East Africa and Latin America.

Key facts

NIH application ID
10107734
Project number
5R01AI131771-04
Recipient
VANDERBILT UNIVERSITY MEDICAL CENTER
Principal Investigator
Pamela A Shaw
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$659,345
Award type
5
Project period
2019-02-04 → 2023-01-31