# The promise of machine learning for novel approaches to archived developmental data

> **NIH NIH R21** · UNIVERSITY OF PITTSBURGH AT PITTSBURGH · 2024 · $234,231

## Abstract

ABSTRACT. The availability of large data sets from research studies via data depositories is believed to be
critical to tackle key questions about “complex diseases” like psychiatric disorders (Farber, 2017; Patel et al.,
2022). Consequently, NIMH-supported investigations have been recently mandated to archive all their data.
Notably, NIMH is currently funding the archiving of three consecutive grants we completed years ago, the data
from which would otherwise be lost to the research community. These projects (plus a recently completed one)
together constitute a longitudinal data base on the course and outcome of young patients with research
diagnoses of depressive disorders that onset in the juvenile years (data on biological siblings and controls are
also available). Juvenile-onset depression (JOD) is a particularly malignant depression phenotype, with a
worse overall clinical course and greater functional impairment than later onset depression, and is still not fully
understood. Our aim is to develop prototype machine learning (ML) algorithms (which can be customized as
needed) to facilitate the analyses of the longitudinal data being archived in the National Data Archive (NDA).
The data reflect repeated assessments from ages 7- to 14-years (at the start of study 1) to ages between the
late 20’s to early 30’s (end of study 4) on multiple domains of functioning and can yield actionable information
about which risk and protective variables/domains best predict clinical and functional outcomes of JOD (e.g.,
depression recurrence, suicidal behavior, emotional competence). Because commonly used modelling
approaches (which typically test a priori defined pathways) cannot accommodate the complexity of our data
and key questions about JOD, we demonstrate the novel application of machine learning (ML) approaches.
We propose that questions about JOD outcomes exemplify two scenarios. Scenario (A) includes questions
about well-established outcomes (e.g., depression recurrence) and a handful of well-known predictors but
meager information about the interrelationships among the predictors, particularly along the course of
development. Scenario (B) reflects questions about less established outcomes (successful emotion regulation)
the predictors of which are not well known, or have only equivocal support. We will demonstrate how to
accommodate such scenarios through two ML approaches: probabilistic graphical modeling and ensemble
learning methods. We apply these modeling approaches within a developmental framework in a unique way to
leverage the wealth of longitudinal information on multiple domains of functioning. To enable researchers to
fully utilize the NDA-based (as well as similar) data, we will release the Python code packages we develop and
the code for downloading and properly organizing the related data. Our approach may shift current analytic
practices in developmental psychopathology research toward models that can optimize the use of such data,
r...

## Key facts

- **NIH application ID:** 10949256
- **Project number:** 1R21MH137601-01
- **Recipient organization:** UNIVERSITY OF PITTSBURGH AT PITTSBURGH
- **Principal Investigator:** MARIA KOVACS
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $234,231
- **Award type:** 1
- **Project period:** 2024-08-01 → 2026-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10949256

## Citation

> US National Institutes of Health, RePORTER application 10949256, The promise of machine learning for novel approaches to archived developmental data (1R21MH137601-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10949256. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
