# Bayesian machine learning for causal inference with incomplete longitudinal covariates and censored survival outcomes

> **NIH NIH R01** · RBHS-SCHOOL OF PUBLIC HEALTH · 2022 · $722,305

## Abstract

Project Summary
Population cohort studies funded by the National Institute of Health, including the Atherosclerosis Risk in Com-
munities (ARIC) Study and Multi-Ethnic Study of Atherosclerosis (MESA), are widely used in cardiovascular
research and have provided fundamental knowledge for cardiovascular disease (CVD) prevention strategies and
public health policies. Pooling data across multiple cohorts provides a unique opportunity for in-depth investiga-
tions of emerging CVD research questions, such as optimal blood pressure threshold values triggering initiation
of antihypertensive treatment for young adults, that heretofore would not have been possible. While forming a
fertile ground for innovative research, the methodological issues associated with the pooled cohorts data cannot
be as effectively addressed by existing statistical methods. There are three main analytic challenges. First, many
discrete or continuous longitudinal variables have missing values with various missing data patterns. Existing
methods either are susceptible to misspeciﬁcation biases or do not provide coherent estimates of imputation un-
certainty, and cannot handle missing not at random. Second, current causal inference methods either require
aligned measurement time points or parametric assumptions about forms of causal pathways, neither of which
can be satisﬁed in complex longitudinal health data. Third, violations of the “sequential ignorability” assumption
embedded in causal inference methodology can be a potential source of bias. The sensitivity analysis methods
for time-varying confounding with censored survival outcomes are underdeveloped. To overcome these chal-
lenges and improve statistical and CVD research, we propose a suite of generalizable statistical methods utilizing
machine learning. We propose to develop a scalable Bayesian nonparametric (BNP) framework to impute con-
tinuous or discrete missing at random longitudinal covariates while providing coherent uncertainty intervals, and
address the missing not at random mechanism via sensitivity analysis. We will apply the developed method to
address missing data issues for several longitudinal CVD risk factors such as blood pressure, cholesterol levels
(Speciﬁc Aim 1); to develop a robust and computationally efﬁcient BNP causal inference method (Speciﬁc Aim
2) and a new continuous-time marginal structural survival model from a Bayesian perspective (Speciﬁc Aim 3) to
study and validate the survival effects of time-varying antihypertensive treatments for young adults and the frail
elderly; to develop a ﬂexible and interpretable survival sensitivity analysis method to assess the sensitivity of the
causal effect estimates to varying degrees of sequential unmeasured confounding (Speciﬁc Aim 4); and to create
usable R software packages for all proposed methods and develop tutorial papers and short courses to bridge
theoretical and practical knowledge and promote use of our methods (Speciﬁc Aim 5).

## Key facts

- **NIH application ID:** 10445648
- **Project number:** 1R01HL159077-01A1
- **Recipient organization:** RBHS-SCHOOL OF PUBLIC HEALTH
- **Principal Investigator:** Liangyuan Hu
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $722,305
- **Award type:** 1
- **Project period:** 2022-05-15 → 2023-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10445648

## Citation

> US National Institutes of Health, RePORTER application 10445648, Bayesian machine learning for causal inference with incomplete longitudinal covariates and censored survival outcomes (1R01HL159077-01A1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10445648. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*