# Statistical Methods for Early Disease Prediction and Treatment Strategy Estimation Using Biomarker Signatures

> **NIH NIH R01** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2020 · $340,025

## Abstract

Neuropsychiatric disorders pose an immense burden on patients, families, and health care systems, thus
underscoring the urgent need to develop disease-modifying treatment. Research on neuropsychiatric disorders
(e.g., Alzheimer’s disease, Parkinson’s disease) faces unique challenges, including the fact that these disorders
typically have a late onset and slow progression, the diagnostic criteria are based on subjective clinical symptoms,
and there is substantial disease and subject heterogeneity. In the proposed work, we aim to tackle these challenges
by leveraging complementary contributions from multiple biomarkers, including genome-wide polymorphisms,
whole brain neuroimaging, biofluids, and comprehensive neuropsychiatric assessments. We develop
sophisticated analytic tools with higher resolution and improved accuracy by accounting for biological mechanisms
of disease, synthesizing dynamic system-wide information, and integrating multiple sources of biomarkers.
These methods are applied to clinical data collected by the investigative team or available from large international
consortia in order to model the earliest pathological changes of neurodegenerative disease, assess treatment
responses, and inform the design of early-intervention clinical trials and the discovery of optimal personalized
therapies. Specifically, in Aim 1, we develop efficient methods for multi-level semiparametric transformation models
to estimate and test the risk of genetic variants on various types of complex phenotypes to inform genetic
counseling and improve clinical trial efficiency. Our methods do not rely on full pedigree genotyping and provide
family-specific substructure, in addition to population substructure, to better control confounding and reduce false
discovery rates in genome-wide association studies. In Aim 2, we develop large-scale nonlinear dynamic systems
through ordinary differential equations with random inflections to understand early pathological changes and
identify subjects with preclinical signs. Our method provides multi-domain integration of ensembles of biomarker
dynamics. In Aim 3, we develop dynamic hazards models and incorporate dynamic network structures to estimate
biomarker profiles that evolve smoothly with disease progression for earlier disease diagnosis. We account for
irregularly measured biomarkers and biological network dependence among biomarkers. In Aim 4, we develop
doubly robust and efficient machine learning methods to identify predictive markers, estimate optimal individualized
therapies, and identify subgroups who may receive the greatest benefit from therapy, with minimal risk.
In each aim, we will validate the proposed methods through extensive simulation studies and demonstrate their
practical value via application to real-world clinical studies. We establish theoretical properties of the proposed
methods using modern empirical process theory and statistical learning theory. Together, the state-of-the-art analytic
met...

## Key facts

- **NIH application ID:** 9927686
- **Project number:** 5R01NS073671-08
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Yuanjia Wang
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $340,025
- **Award type:** 5
- **Project period:** 2011-07-15 → 2022-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9927686

## Citation

> US National Institutes of Health, RePORTER application 9927686, Statistical Methods for Early Disease Prediction and Treatment Strategy Estimation Using Biomarker Signatures (5R01NS073671-08). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9927686. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
