Bayesian machine learning for causal inference with incomplete longitudinal covariates and censored survival outcomes

NIH RePORTER · NIH · R01 · $722,305 · view on reporter.nih.gov ↗

Abstract

Project Summary Population cohort studies funded by the National Institute of Health, including the Atherosclerosis Risk in Com- munities (ARIC) Study and Multi-Ethnic Study of Atherosclerosis (MESA), are widely used in cardiovascular research and have provided fundamental knowledge for cardiovascular disease (CVD) prevention strategies and public health policies. Pooling data across multiple cohorts provides a unique opportunity for in-depth investiga- tions of emerging CVD research questions, such as optimal blood pressure threshold values triggering initiation of antihypertensive treatment for young adults, that heretofore would not have been possible. While forming a fertile ground for innovative research, the methodological issues associated with the pooled cohorts data cannot be as effectively addressed by existing statistical methods. There are three main analytic challenges. First, many discrete or continuous longitudinal variables have missing values with various missing data patterns. Existing methods either are susceptible to misspecification biases or do not provide coherent estimates of imputation un- certainty, and cannot handle missing not at random. Second, current causal inference methods either require aligned measurement time points or parametric assumptions about forms of causal pathways, neither of which can be satisfied in complex longitudinal health data. Third, violations of the “sequential ignorability” assumption embedded in causal inference methodology can be a potential source of bias. The sensitivity analysis methods for time-varying confounding with censored survival outcomes are underdeveloped. To overcome these chal- lenges and improve statistical and CVD research, we propose a suite of generalizable statistical methods utilizing machine learning. We propose to develop a scalable Bayesian nonparametric (BNP) framework to impute con- tinuous or discrete missing at random longitudinal covariates while providing coherent uncertainty intervals, and address the missing not at random mechanism via sensitivity analysis. We will apply the developed method to address missing data issues for several longitudinal CVD risk factors such as blood pressure, cholesterol levels (Specific Aim 1); to develop a robust and computationally efficient BNP causal inference method (Specific Aim 2) and a new continuous-time marginal structural survival model from a Bayesian perspective (Specific Aim 3) to study and validate the survival effects of time-varying antihypertensive treatments for young adults and the frail elderly; to develop a flexible and interpretable survival sensitivity analysis method to assess the sensitivity of the causal effect estimates to varying degrees of sequential unmeasured confounding (Specific Aim 4); and to create usable R software packages for all proposed methods and develop tutorial papers and short courses to bridge theoretical and practical knowledge and promote use of our methods (Specific Aim 5).

Key facts

NIH application ID
10445648
Project number
1R01HL159077-01A1
Recipient
RBHS-SCHOOL OF PUBLIC HEALTH
Principal Investigator
Liangyuan Hu
Activity code
R01
Funding institute
NIH
Fiscal year
2022
Award amount
$722,305
Award type
1
Project period
2022-05-15 → 2023-04-30