# Penalized mixture cure models for identifying genomic features associated with outcome in acute myeloid leukemia

> **NIH NIH R01** · OHIO STATE UNIVERSITY · 2024 · $258,743

## Abstract

Molecular features associated with time-to-event outcomes, such as overall or disease-free survival, may be
prognostically relevant or potential therapeutic targets. Therefore, analyzing data from high-throughput genomic
assays with clinical follow-up data has been of growing interest. The Cancer Genome Atlas (TCGA) Project has
collected baseline demographic, clinical characteristics, and follow-up data for 11,125 patients for 32 different
cancer types and corresponding tissue samples were processed for examining SNPs, copy number, methylation,
miRNA expression, and mRNA expression. Because the number of variables (P ) exceeds the sample size (N),
one strategy frequently employed when associating molecular features to survivorship data is to ﬁt univariable
Cox proportional hazards (PH) models followed by adjustment for multiple hypothesis tests using a false discovery
rate approach. However, most chronic conditions and diseases, including cancer, are likely caused by multiple
dysregulated genes or mutations. It is therefore critical to ﬁt multivariable models in the presence of a high-
dimensional covariate space. Traditional statistical methods cannot be used when the number of features exceeds
the sample size (e.g., P > N), though penalized methods perform automatic variable selection and accommodate
the P > N scenario. Penalized approaches including LASSO, smoothly clipped absolute deviation (SCAD),
adaptive LASSO, and Bayesian LASSO have all been extended to Cox's PH model for handling high-dimensional
covariate spaces. However, when modeling survival or other time-to-event outcomes, the Cox PH model assumes
that all subjects will experience the event of interest, which is violated when a subset of subjects are cured.
Instead, when a subset of subjects in the data are cured, mixture cure models should be ﬁt. Although mixture
cure models have been described for traditional settings where the number of samples exceeds the number
of covariates, limited variable selection methods and no methods for high-dimensional model ﬁtting currently
exist for mixture cure models. Therefore, this project will overcome a critical barrier to progress in this ﬁeld
by developing penalized parametric and semi-parametric mixture cure models applicable for high-dimensional
datasets. The speciﬁc aims of this application are to: (1) Develop penalized parametric mixture cure models
for high-dimensional datasets; and (2) Develop a penalized semi-parametric proportional hazards mixture cure
model for high-dimensional datasets. For both aims we will characterize the performance of the methods using
extensive simulation studies, develop software, and distribute R packages to CRAN. In aim (3) we will identify
molecular features associated with cure and survival using our large unique AML dataset from the Alliance for
Clinical Trials in Oncology and assess robustness of ﬁndings using AML datasets from Gene Expression Omnibus
and The Cancer Genome Atlas project. This research...

## Key facts

- **NIH application ID:** 10749898
- **Project number:** 5R01LM013879-03
- **Recipient organization:** OHIO STATE UNIVERSITY
- **Principal Investigator:** Kellie J. Archer
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $258,743
- **Award type:** 5
- **Project period:** 2022-01-01 → 2025-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10749898

## Citation

> US National Institutes of Health, RePORTER application 10749898, Penalized mixture cure models for identifying genomic features associated with outcome in acute myeloid leukemia (5R01LM013879-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10749898. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*