Project Summary Population cohort studies funded by the National Institute of Health, including the Atherosclerosis Risk in Com- munities (ARIC) Study and Multi-Ethnic Study of Atherosclerosis (MESA), are widely used in cardiovascular research and have provided fundamental knowledge for cardiovascular disease (CVD) prevention strategies and public health policies. Pooling data across multiple cohorts provides a unique opportunity for in-depth investiga- tions of emerging CVD research questions, such as optimal blood pressure threshold values triggering initiation of antihypertensive treatment for young adults, that heretofore would not have been possible. While forming a fertile ground for innovative research, the methodological issues associated with the pooled cohorts data cannot be as effectively addressed by existing statistical methods. There are three main analytic challenges. First, many discrete or continuous longitudinal variables have missing values with various missing data patterns. Existing methods either are susceptible to misspecification biases or do not provide coherent estimates of imputation un- certainty, and cannot handle missing not at random. Second, current causal inference methods either require aligned measurement time points or parametric assumptions about forms of causal pathways, neither of which can be satisfied in complex longitudinal health data. Third, violations of the “sequential ignorability” assumption embedded in causal inference methodology can be a potential source of bias. The sensitivity analysis methods for time-varying confounding with censored survival outcomes are underdeveloped. To overcome these chal- lenges and improve statistical and CVD research, we propose a suite of generalizable statistical methods utilizing machine learning. We propose to develop a scalable Bayesian nonparametric (BNP) framework to impute con- tinuous or discrete missing at random longitudinal covariates while providing coherent uncertainty intervals, and address the missing not at random mechanism via sensitivity analysis. We will apply the developed method to address missing data issues for several longitudinal CVD risk factors such as blood pressure, cholesterol levels (Specific Aim 1); to develop a robust and computationally efficient BNP causal inference method (Specific Aim 2) and a new continuous-time marginal structural survival model from a Bayesian perspective (Specific Aim 3) to study and validate the survival effects of time-varying antihypertensive treatments for young adults and the frail elderly; to develop a flexible and interpretable survival sensitivity analysis method to assess the sensitivity of the causal effect estimates to varying degrees of sequential unmeasured confounding (Specific Aim 4); and to create usable R software packages for all proposed methods and develop tutorial papers and short courses to bridge theoretical and practical knowledge and promote use of our methods (Specific Aim 5).