# Deep Learning Models for Metabolomics Analysis

> **NIH NIH R35** · TUFTS UNIVERSITY MEDFORD · 2024 · $62,824

## Abstract

PROJECT SUMMARY
Untargeted metabolomics using tandem mass spectrometry (MS) have attained substantial success in the
discovery of biomarkers and advancing our understanding of cellular metabolism. Despite this success, only a
small fraction of measured spectra can currently be annotated (assigned a chemical identity). This bottleneck
can be attributed to the limitations of current annotation tools that have not yet exploited advances in deep
learning and available data modalities (spectra, peaks, molecules, and fragments). The goal of this application
is to advance the interpretation of spectra collected through untargeted metabolomics. We focus on annotating
data collected through liquid or gas chromatology followed by MS, or MS/MS, as these three tandem
technologies have become dominant technologies. Over the next five years, the plan is to harness deep learning
to address three problems: 1) annotation, 2) translation between spectra measured under different instrument
settings, and 3) explainable models for annotation, where explainability arises from connecting peaks to their
respective molecular fragments.
 The Hassoun lab has extensive, relevant deep learning experience to effectively tackle these problems.
The Lab also has experience in dealing with the nuances of metabolomics datasets. The Lab recently developed
a novel deep learning annotation model that achieves 41% and 30% performance improvement over multi-layer
neural networks and graph neural networks, respectively. Additionally, our lab has developed an ontology-
traversal algorithm that yields correct-by-construction molecular substructures that can be assigned to peaks,
thus giving rise to datasets that can be used to train explainable annotation models.
 The Significance of this research is that it addresses fundamental barriers that hinder developing deep
learning annotation models. Our models and datasets will be released on GitHub to benefit biological and
biomedical applications and metabolomics research. Because of their expected high accuracy and explainability,
the models will expedite the interpretation of experiments, improve our understanding of cellular metabolism,
and facilitate data sharing among labs. The innovation lies in maximally learn from data modalities and in
creating models that exploit the learned representations. Further, the annotation and translation problems are
formulated as a bidirectional mapping between domains, in contrast to current annotation models that assume
unimodal mappings. These innovations are necessary to advance metabolomics research and they will open
new research horizons in the field of metabolomics.

## Key facts

- **NIH application ID:** 10868085
- **Project number:** 3R35GM148219-02S1
- **Recipient organization:** TUFTS UNIVERSITY MEDFORD
- **Principal Investigator:** Soha Hassoun
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $62,824
- **Award type:** 3
- **Project period:** 2023-04-01 → 2025-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10868085

## Citation

> US National Institutes of Health, RePORTER application 10868085, Deep Learning Models for Metabolomics Analysis (3R35GM148219-02S1). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10868085. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
