A Pragmatic Latent Variable Learning Approach Aligned with Clinical Practice

NIH RePORTER · NIH · R01 · $558,490 · view on reporter.nih.gov ↗

Abstract

Abstract With growing interest in personalized medicine and the rise of machine learning, constructing good risk prediction and prognostic models has been drawing renewed attention. In this development, much effort is concentrated in identifying good predictors of patient outcomes, although the same level of rigor is often absent in improving the outcome side of prediction. The majority of popular supervised techniques (e.g., regularized logistic regression and its variations), which can be readily applied in risk model development, assumes that the prediction target is a clear single outcome measured at a single time point. In clinical reality, patient outcomes are often complex, multivariate, and measured with errors. Even when a target is a relatively clear univariate outcome (e.g., death, cancer, diabetes, etc), the process that leads to this ultimate outcome often involves complex intermediate outcomes, where predicting and understanding this intermediate process can be crucial in providing effective care and preventing negative ultimate outcomes. The situation calls for a flexible learning framework that can easily incorporate this important but neglected aspect in model development - better characterizing and constructing prediction targets before building prediction models. Focusing on risk labels as prediction targets, we propose a pragmatic 3-stage learning approach, where we sequentially 1) generate latent labels, 2) validate them using explicit validators, and 3) go on with supervised learning with labeled data. Latent variable (LV) strategies used in Satge 1 have great potentials in handling complex outcome information. The unsupervised nature of LV strategies makes highly flexible data synthesis and organization possible. The same nature, however, can also be seen as esoteric and subjective, which is not desirable in situations where transparency and reproducibility are of great concern such as in risk prediction. As a practical solution to this problem, we propose the use of explicit clinical validators, which not only makes LV-based labels closely aligned with contemporary science and clinical practice, but also makes it possible to automatically validate and narrow a large pool of candidate labels. With the goal of developing a practical and transparent system of learning and inference for clinical research and practice, we formed a highly interdisciplinary team of researchers with expertise in latent variable modeling, machine learning, psychometrics and causal inference along with clinical/substantive expertise. Our streamlined learning framework focuses on direct and transparent validation of latent variable solutions to ensure clear communication across risk model developers, clinical researchers and practitioners. The project ultimately aims to improve personalized treatment and care by improving risk prediction.

Key facts

NIH application ID
10212944
Project number
5R01MH123443-02
Recipient
STANFORD UNIVERSITY
Principal Investigator
BOOIL JO
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$558,490
Award type
5
Project period
2020-07-08 → 2023-05-31