Adapting machine learning methods to detect genetic loci specific to strictly defined MDD

NIH RePORTER · NIH · R21 · $208,389 · view on reporter.nih.gov ↗

Abstract

Abstract This project seeks to further our understanding of the genetic influences on Major Depressive Disorder (MDD). One approach to increasing sample sizes for molecular genetic studies of MDD and thereby increasing power to detect genetic loci is to assess individuals using surveys that are shorter and more efficient than full clinical assessments. This `minimal phenotyping' leads to identification of risk loci that may not be specific to strictly defined MDD and can be associated with a variety of psychiatric phenotypes. While these discoveries are important to understand the overall biology of complex mental and psychiatric outcomes, they offer little direct and actionable insight into the biological underpinning of strictly defined MDD which shows increased severity, impairment, and recurrence risk and accounts for a disproportionate impact on disability and morbidity in comparison to liberally defined MDD. Recently, large biobanks surveying tens to hundreds of thousands of subjects across hundreds to thousands of variables and EHR records have been become available to the scientific community. Combining rich phenotype data with genome-wide genotyping or sequencing offers an unprecedented opportunity to leverage these resources to advance discovery and understanding of the genetic influences on MDD. One major challenge is the lack of uniform measures that allow assessment of strictly defined MDD, impairment, severity, and recurrence risk. This lack of `deep phenotyping' while pragmatic in allowing the assembly of large samples, creates challenges in accurate determinations of controls, non-specific mild cases, and strictly defined cases. We have previously shown how machine learning (ML) analysis methods can leverage this type of heterogeneous, broad, but light collection of information to predict and quantify risk in subjects not deeply assessed. While there is significant room for improvement in these predictions, the resulting effective sample size and power to detect specific liability loci increased dramatically when this method was applied. In Aim 1, we plan to evaluate 2 families of ML methods that can be used to predict unmeasured and specific strictly defined MDD risk. In Aim 2, we propose to use these predictions of risk in genetic association analyses to detect common genetic variation that influences risk specific to strictly defined MDD. Finally, we will make our biobank adapted ML method pipeline available to the broader psychiatric genetics research community which is expected to improve power and loci detection for other psychiatric disorders.

Key facts

NIH application ID
10196078
Project number
1R21MH126358-01
Recipient
RESEARCH TRIANGLE INSTITUTE
Principal Investigator
BRADLEY Todd WEBB
Activity code
R21
Funding institute
NIH
Fiscal year
2021
Award amount
$208,389
Award type
1
Project period
2021-04-01 → 2023-03-31