# Data-driven subtyping in major depressive disorder

> **NIH NIH R01** · MASSACHUSETTS GENERAL HOSPITAL · 2021 · $832,192

## Abstract

Abstract
 Major depressive disorder contributes substantially to morbidity, mortality, and health care cost.
Standard treatments are ineffective for up to a third of patients, so new treatment options are needed
along with strategies to make more effective use of existing treatments. However, progress in
expanding therapeutic options has been hindered by heterogeneity in clinical presentation and course
of depression.
 In other disorders such as inflammatory bowel disease, cancer, and dementia, identifying
disease subtypes has led to therapeutic discoveries. In major depressive disorder, efforts to identify
subtypes based on clinical observation have yielded limited success, primarily because of the lack of
availability of adequate cohorts for replication, and because those features most apparent to
clinicians may not be the most relevant for differentiating subgroups. Efforts to leverage large
electronic health record data sets for subtyping address some of these challenges, but standard
approaches may not yield human-interpretable features nor those with value in prediction.
 The investigators have developed methods for engineering features that balance utility in
prediction with interpretability. Preliminary work by the investigators during a year of R56 support
yielding 4 publications demonstrates that this approach indeed yields coherent topics without
sacrificing predictive validity; electronic health records contain meaningful data that facilitates
identification of interpretable patient subgroups. The present study draws on very large cohorts of
individuals with major depression, defined by a validated algorithm, in electronic health records from
two health systems. It will first apply methods developed by the investigators to identify MDD
subtypes. These subtypes will then be examined in terms of predictive validity as well as
interpretability by clinicians.
 The study builds on a productive collaboration between a team experienced in mood disorder
phenotyping and clinical investigation, analysis of large-scale longitudinal electronic health records,
and development and application of innovative methods in machine learning that yield interpretable
models rather than black boxes. Data-driven disease subtyping will facilitate clinically useful risk
stratification as well as biological study of mood disorders.

## Key facts

- **NIH application ID:** 10211310
- **Project number:** 1R01MH123804-01A1
- **Recipient organization:** MASSACHUSETTS GENERAL HOSPITAL
- **Principal Investigator:** ROY H. Perlis
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $832,192
- **Award type:** 1
- **Project period:** 2021-04-16 → 2025-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10211310

## Citation

> US National Institutes of Health, RePORTER application 10211310, Data-driven subtyping in major depressive disorder (1R01MH123804-01A1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10211310. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
