# Bayesian machine learning for complex missing data and causal inference with a focus on cardiovascular and obesity studies

> **NIH NIH R01** · UNIVERSITY OF FLORIDA · 2023 · $548,304

## Abstract

Project Summary
 This proposal will develop Bayesian machine learning approaches via Bayesian nonparametrics (BNP) to handle
nonignorable missingness (in outcomes and covariates) and conduct causal inference for electronic health records
(EHRs), to address missingness in multivariate longitudinal data, and for causal mediation problems. Missing
data remains a problem in clinical studies and in particular, for studies using EHRs. In clinical studies, more
e↵ort is spent to try to minimize the amount of missingness, but it still remains a problem and missingness is
a constant issue (and less controllable) in studies based on EHRs. In addition, there has been limited work on
the use of auxiliary information in EHRs that can enhance the ability to deal with missing data. Approaches for
missingness in multivariate longitudinal data is underdeveloped and relevant across many clinical trials settings
from cost e↵ectiveness analysis to incomplete time-varying auxiliary covariates (or confounders) to causal mediation
to multiple outcomes of interest. The mechanisms of treatment e↵ectiveness are of particular interest in behavioral
trials. Speciﬁcally, how do di↵erent processes mediate the e↵ect of an intervention? This can facilitate constructing
future interventions. However, determining the causal e↵ect of such 'mediators' on outcomes is di"cult. We will
develop new approaches to identify these e↵ects in the complex setting of cluster randomized trials for which
little work has been done. For all these settings, a Bayesian approach is ideal as it allows one to appropriately
characterize uncertainty about unveriﬁable assumptions (which are present in all these problems) and allows the
ﬂexibility of Bayesian nonparametric models. MCMC algorithms for BNP can sometimes converge slowly and can
be untenable for large n. We will extend existing approaches to address both these complications which will be
important for all the applications considered and in general, given the increasing size and complexity of data.
 The methods are motivated by several NHLBI funded studies, whose PI's are co-investigators on this proposal,
and will be developed to help answer numerous important clinical questions including the mechanisms of behavior
change in weight management and the impact of linkage (and engagement) to care on treatment e↵ectiveness
for blood pressure outcomes. The methods will also help us evaluate potentially synergistic e↵ects when drugs
with potential diabetogenic e↵ects are used concomitantly and whether the impact on cancer outcomes varies by
di↵erent bariatric surgeries.
 The history of the the collaborations among the entire study team will help produce the best science and
facilitate dissemination of our methodological and clinical ﬁndings. We will disseminate code for these methods
(via the PI's github page and software papers) to ensure the methods will be readily usable by investigators
involved in cardiovascular, obesity, diabetes, and cancer st...

## Key facts

- **NIH application ID:** 10563598
- **Project number:** 1R01HL166324-01
- **Recipient organization:** UNIVERSITY OF FLORIDA
- **Principal Investigator:** Michael J Daniels
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $548,304
- **Award type:** 1
- **Project period:** 2023-03-01 → 2027-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10563598

## Citation

> US National Institutes of Health, RePORTER application 10563598, Bayesian machine learning for complex missing data and causal inference with a focus on cardiovascular and obesity studies (1R01HL166324-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10563598. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
