# Adapting machine learning methods to detect genetic loci specific to strictly defined MDD

> **NIH NIH R21** · RESEARCH TRIANGLE INSTITUTE · 2021 · $208,389

## Abstract

Abstract
 This project seeks to further our understanding of the genetic influences on Major Depressive Disorder
(MDD). One approach to increasing sample sizes for molecular genetic studies of MDD and thereby increasing
power to detect genetic loci is to assess individuals using surveys that are shorter and more efficient than full
clinical assessments. This `minimal phenotyping' leads to identification of risk loci that may not be specific to
strictly defined MDD and can be associated with a variety of psychiatric phenotypes. While these discoveries are
important to understand the overall biology of complex mental and psychiatric outcomes, they offer little direct
and actionable insight into the biological underpinning of strictly defined MDD which shows increased severity,
impairment, and recurrence risk and accounts for a disproportionate impact on disability and morbidity in
comparison to liberally defined MDD. Recently, large biobanks surveying tens to hundreds of thousands of
subjects across hundreds to thousands of variables and EHR records have been become available to the
scientific community. Combining rich phenotype data with genome-wide genotyping or sequencing offers an
unprecedented opportunity to leverage these resources to advance discovery and understanding of the genetic
influences on MDD. One major challenge is the lack of uniform measures that allow assessment of strictly defined
MDD, impairment, severity, and recurrence risk. This lack of `deep phenotyping' while pragmatic in allowing the
assembly of large samples, creates challenges in accurate determinations of controls, non-specific mild cases,
and strictly defined cases. We have previously shown how machine learning (ML) analysis methods can leverage
this type of heterogeneous, broad, but light collection of information to predict and quantify risk in subjects not
deeply assessed. While there is significant room for improvement in these predictions, the resulting effective
sample size and power to detect specific liability loci increased dramatically when this method was applied. In
Aim 1, we plan to evaluate 2 families of ML methods that can be used to predict unmeasured and specific strictly
defined MDD risk. In Aim 2, we propose to use these predictions of risk in genetic association analyses to detect
common genetic variation that influences risk specific to strictly defined MDD. Finally, we will make our biobank
adapted ML method pipeline available to the broader psychiatric genetics research community which is expected
to improve power and loci detection for other psychiatric disorders.

## Key facts

- **NIH application ID:** 10196078
- **Project number:** 1R21MH126358-01
- **Recipient organization:** RESEARCH TRIANGLE INSTITUTE
- **Principal Investigator:** BRADLEY Todd WEBB
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $208,389
- **Award type:** 1
- **Project period:** 2021-04-01 → 2023-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10196078

## Citation

> US National Institutes of Health, RePORTER application 10196078, Adapting machine learning methods to detect genetic loci specific to strictly defined MDD (1R21MH126358-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10196078. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*