# Efficient Refactoring of Longitudinal Targeted Machine Learning

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA BERKELEY · 2023 · $223,272

## Abstract

PROJECT SUMMARY
Personalized intervention in a heterogeneous population requires refined methods and analytics to estimate
the treatment effects of dynamic interventions from longitudinal medical history, in particular, to choose the
optimal individual intervention that maximizes the expectation of long-term future outcomes. The g-formula, in
theory, solves this longitudinal causal inference problem with multiple time point interventions and time-
dependent confounding.1 The causal target parameter under dynamic intervention is identified through the g-
functional of distributions of observables.2
We have developed the longitudinal target minimum loss-based estimator (LTMLE)3, a semiparametric doubly-
robust efficient plug-in estimator for g-functionals. We have implemented it as an R package ltmle4. LTMLE
requires nuisance parameter estimators with fast convergence rates. We have developed a highly adaptive
lasso (HAL) theory with sufficient convergence rates.5–7 We have implemented R packages hal90018 for
general HAL estimators and haldensify9 for conditional density estimations with HAL. LTMLE with g-
computation estimating the full likelihood of observables using HAL provides a more generative approach that
is compatible with modern generative artificial intelligence (AI) yet has a statistical guarantee.
However, these packages suffer from computational inefficiency when the medical data becomes very large.
This is because our current software is built upon the standard central processing unit (CPU) that has limited
computational capacity for calculus and linear algebra. Recent programming libraries in Python for efficient
vectorized computation with graphic processing units (GPUs) or tensor processing units (TPUs) can solve this
scalability issue. Therefore, we will implement LTMLE and HAL in Python with Google JAX11, which optimizes
auto differential (autograd) and array computations for CPUs, GPUs, and TPUs.
Designing software friendly to data scientists is essential for the sustainability of the software. Causal inference
from longitudinal observational data is also the central problem in the field of reinforcement learning (RL)12.
Estimation of the mean counterfactual outcome under dynamic interventions with time-dependent confounding
is called off-policy evaluation (OPE) in the offline RL literature.13 We will design the software friendly to
researchers from both fields and guide users with vignettes with their familiar scenarios for smooth adoption.
We set the following three aims. Aim 1: Develop efficient implementation of g-computation of LTMLE with HAL
for survival outcome using Google JAX. Aim 2: Design user-friendly LTMLE software for biomedical
researchers and data scientists. Aim 3: Present the software and its performance and application at academic
conferences in epidemiology and computer science.

## Key facts

- **NIH application ID:** 10839657
- **Project number:** 3R01AI074345-13S1
- **Recipient organization:** UNIVERSITY OF CALIFORNIA BERKELEY
- **Principal Investigator:** Maya Liv Petersen
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $223,272
- **Award type:** 3
- **Project period:** 2007-07-01 → 2025-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10839657

## Citation

> US National Institutes of Health, RePORTER application 10839657, Efficient Refactoring of Longitudinal Targeted Machine Learning (3R01AI074345-13S1). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10839657. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*