# Methods for High-Dimensional Statistical Inference and Individualized Risk Prediction under Semi-Competing Risks

> **NIH NIH F31** · HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH · 2021 · $34,886

## Abstract

Project Summary/Abstract
Patient care has been transformed by the availability of high-dimensional sources like electronic health records
(EHR) and genomic data, allowing health care decisions to be tailored to individual patients. Statistical methods
have been developed to efﬁciently use such high dimensional data, but critical gaps still remain. Several common
models for survival analysis have recently been extended to accommodate high-dimensional variable selection
and machine learning prediction methods, but similar tools have not yet been developed for the setting of semi-
competing risks. In the semi-competing risks setting, interest focuses on jointly modeling both a terminal time-
to-event outcome, as well as a non-terminal time-to-event outcome which can only occur for subjects who have
not yet experienced the terminal event. Examples of this exist in severe pregnancy-related diseases such as
pre-eclampsia (PE - further described below). PE and subsequent delivery are natural semi-competing risks,
as PE can develop before delivery, but not after. Current methods do not provide analysts with data-driven tools
for uncovering important covariates from high-dimensional data, and clinicians lack meaningful, personalized
predictions of patients' joint probability of experiencing one or both outcomes prospectively through time.
 This proposal addresses these methodological gaps with tools for high-dimensional inference and prediction.
In Aim 1, I will address the challenge of variable selection by developing a suite of regularized estimators for se-
lecting important covariates from large datasets into a semi-competing risks model, and evaluating performance
by simulation. In Aim 2, I will create a deep feed forward neural network modeling framework for predicting
individual patients' joint probabilities of experiencing one or both outcomes of interest across future time points.
Together, these aims will improve personalization of health care decisions. Software will be developed that
provides researchers practical and user-friendly tools for applying these methods. In Aim 3, I will apply these
approaches for semi-competing risks to evaluate risk of PE, which is globally a leading cause of maternal and
fetal/neonatal mortality and morbidity. Using EHR pregnancy data from 50,000 births between 2011-2020, I
will use the proposed variable selection methods to develop a model identifying risk factors for PE along with
factors affecting time-to-delivery among PE patients. Through this work, I will also build a deep learning model
in order to jointly predict maternal PE and NICU admission of the infant, yielding personalized prediction plots to
facilitate care decisions that balance maternal and fetal health risks. For ease of use by clinicians and patients,
I will disseminate this prediction model using an interactive online tool.

## Key facts

- **NIH application ID:** 10249946
- **Project number:** 5F31HD102159-02
- **Recipient organization:** HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH
- **Principal Investigator:** Harrison Reeder
- **Activity code:** F31 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $34,886
- **Award type:** 5
- **Project period:** 2020-09-01 → 2022-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10249946

## Citation

> US National Institutes of Health, RePORTER application 10249946, Methods for High-Dimensional Statistical Inference and Individualized Risk Prediction under Semi-Competing Risks (5F31HD102159-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10249946. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
