DMS/NIGMS 2: A Stability Driven Recommendation System for Efficient Disease Mechanistic Discovery

NIH RePORTER · NIH · R01 · $271,288 · view on reporter.nih.gov ↗

Abstract

Overview. It is crucial to uncover the biological features underlying disease mechanisms to develop effective treatments and therapies. Typically, this is done via a two-step process: in stage 1, statistical analyses are used to recommend candidate variants/genes for follow-up investigation. In stage 2, researchers conduct costly experiments, clinical trials, or external studies via independent cohorts to validate or establish causality between candidate features and disease traits. To minimize costs, recommendations should lead to high-yield experiments and be replicable. These recommendations are often generated through GWAS methods, based on linear mixed models. Despite the successes of GWAS, there still exists a substantial heritability gap limiting the applicability of these associations in clinical practice. A number of key issues can contribute to missing heritability including: the need for more informative, multi-modal features; unidentified non-linear and epistatic effects; linkage disequilibrium among variants; and heterogeneous sources of variability. To confront these challenges, we propose a reality-checked stability-driven feature recommendation system based on decision trees that aims at efficient discoveries for high yields in experimentation. We build upon iterative random forests (iRF) and the veridical data science framework based on the principles of Predictability, Computability and Stability (PCS) developed by the PI to propose a number of novel advances for stage 1. We propose: (1) generalized MDI (gMDI) a stability-driven non-linear feature important measure for improving iRF recommendations; (2) dependence-aware feature and interaction discovery; (3) supervised local feature importance for heterogeneous mechanistic discoveries; and (4) validation through gene-silencing experiments. Importantly, we generate multi-modal features to extract information across the genome. Intellectual Merit. Our proposals: improve MDI-based methods by addressing drawbacks of MDI and tailoring to problem settings; incorporate gMDI and dependence structure in iRF; and detect heterogeneous sources of noise. Each aim will be vetted and follow the veridical data science framework. In the case study, we will recommend genes and interactions for gene-silencing experiments. These will supply valuable insights into genetic mechanisms underlying traits related to cardiac hypertrophy. Results of this work will impact mechanistic discovery for complex diseases and advance statistical methodology.

Key facts

NIH application ID
10933563
Project number
5R01GM152718-02
Recipient
UNIVERSITY OF CALIFORNIA BERKELEY
Principal Investigator
Bin Yu
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$271,288
Award type
5
Project period
2023-09-25 → 2027-06-30