Bayesian machine learning for causal inference with incomplete longitudinal covariates and censored survival outcomes

NIH RePORTER · NIH · R01 · $722,305 · view on reporter.nih.gov ↗

Abstract

Project Summary Population cohort studies funded by the National Institute of Health, including the Atherosclerosis Risk in Com- munities (ARIC) Study and Multi-Ethnic Study of Atherosclerosis (MESA), are widely used in cardiovascular research and have provided fundamental knowledge for cardiovascular disease (CVD) prevention strategies and public health policies. Pooling data across multiple cohorts provides a unique opportunity for in-depth investiga- tions of emerging CVD research questions, such as optimal blood pressure threshold values triggering initiation of antihypertensive treatment for young adults, that heretofore would not have been possible. While forming a fertile ground for innovative research, the methodological issues associated with the pooled cohorts data cannot be as effectively addressed by existing statistical methods. There are three main analytic challenges. First, many discrete or continuous longitudinal variables have missing values with various missing data patterns. Existing methods either are susceptible to misspeciﬁcation biases or do not provide coherent estimates of imputation un- certainty, and cannot handle missing not at random. Second, current causal inference methods either require aligned measurement time points or parametric assumptions about forms of causal pathways, neither of which can be satisﬁed in complex longitudinal health data. Third, violations of the “sequential ignorability” assumption embedded in causal inference methodology can be a potential source of bias. The sensitivity analysis methods for time-varying confounding with censored survival outcomes are underdeveloped. To overcome these chal- lenges and improve statistical and CVD research, we propose a suite of generalizable statistical methods utilizing machine learning. We propose to develop a scalable Bayesian nonparametric (BNP) framework to impute con- tinuous or discrete missing at random longitudinal covariates while providing coherent uncertainty intervals, and address the missing not at random mechanism via sensitivity analysis. We will apply the developed method to address missing data issues for several longitudinal CVD risk factors such as blood pressure, cholesterol levels (Speciﬁc Aim 1); to develop a robust and computationally efﬁcient BNP causal inference method (Speciﬁc Aim 2) and a new continuous-time marginal structural survival model from a Bayesian perspective (Speciﬁc Aim 3) to study and validate the survival effects of time-varying antihypertensive treatments for young adults and the frail elderly; to develop a ﬂexible and interpretable survival sensitivity analysis method to assess the sensitivity of the causal effect estimates to varying degrees of sequential unmeasured confounding (Speciﬁc Aim 4); and to create usable R software packages for all proposed methods and develop tutorial papers and short courses to bridge theoretical and practical knowledge and promote use of our methods (Speciﬁc Aim 5).

Key facts

NIH application ID: 10445648
Project number: 1R01HL159077-01A1
Recipient: RBHS-SCHOOL OF PUBLIC HEALTH
Principal Investigator: Liangyuan Hu
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $722,305
Award type: 1
Project period: 2022-05-15 → 2023-04-30