# DMS/NIGMS 2: A Stability Driven Recommendation System for Efficient Disease Mechanistic Discovery

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA BERKELEY · 2024 · $271,288

## Abstract

Overview. It is crucial to uncover the biological features underlying disease mechanisms to develop
effective treatments and therapies. Typically, this is done via a two-step process: in stage 1, statistical
analyses are used to recommend candidate variants/genes for follow-up investigation. In stage 2,
researchers conduct costly experiments, clinical trials, or external studies via independent cohorts to
validate or establish causality between candidate features and disease traits. To minimize costs,
recommendations should lead to high-yield experiments and be replicable. These recommendations are
often generated through GWAS methods, based on linear mixed models. Despite the successes of
GWAS, there still exists a substantial heritability gap limiting the applicability of these associations in
clinical practice. A number of key issues can contribute to missing heritability including: the need for more
informative, multi-modal features; unidentified non-linear and epistatic effects; linkage disequilibrium
among variants; and heterogeneous sources of variability. To confront these challenges, we propose a
reality-checked stability-driven feature recommendation system based on decision trees that aims at
efficient discoveries for high yields in experimentation. We build upon iterative random forests (iRF) and
the veridical data science framework based on the principles of Predictability, Computability and Stability
(PCS) developed by the PI to propose a number of novel advances for stage 1. We propose: (1)
generalized MDI (gMDI) a stability-driven non-linear feature important measure for improving iRF
recommendations; (2) dependence-aware feature and interaction discovery; (3) supervised local feature
importance for heterogeneous mechanistic discoveries; and (4) validation through gene-silencing
experiments. Importantly, we generate multi-modal features to extract information across the genome.
Intellectual Merit. Our proposals: improve MDI-based methods by addressing drawbacks of MDI and
tailoring to problem settings; incorporate gMDI and dependence structure in iRF; and detect
heterogeneous sources of noise. Each aim will be vetted and follow the veridical data science framework.
In the case study, we will recommend genes and interactions for gene-silencing experiments. These will
supply valuable insights into genetic mechanisms underlying traits related to cardiac hypertrophy. Results
of this work will impact mechanistic discovery for complex diseases and advance statistical methodology.

## Key facts

- **NIH application ID:** 10933563
- **Project number:** 5R01GM152718-02
- **Recipient organization:** UNIVERSITY OF CALIFORNIA BERKELEY
- **Principal Investigator:** Bin Yu
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $271,288
- **Award type:** 5
- **Project period:** 2023-09-25 → 2027-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10933563

## Citation

> US National Institutes of Health, RePORTER application 10933563, DMS/NIGMS 2: A Stability Driven Recommendation System for Efficient Disease Mechanistic Discovery (5R01GM152718-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10933563. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
