# Computational Techniques for Advancing Untargeted Metabolomics Analysis

> **NIH NIH R01** · TUFTS UNIVERSITY MEDFORD · 2021 · $372,055

## Abstract

PROJECT SUMMARY/ABSTRACT
Detecting and quantifying products of cellular metabolism using mass spectrometry (MS) has already shown
great promise in biomarker discovery, nutritional analysis and other biomedical research fields. Despite recent
advances in analysis techniques, our ability to interpret MS measurements remains limited. The biggest
challenge in metabolomics is annotation, where measured compounds are assigned chemical identities. The
annotation rates of current computational tools are low. For several surveyed metabolomics studies, less than
20% of all compounds are annotated. Another contributing factor to low annotation rates is the lack of systematic
ways of designing a candidate set, a listing of putative chemical identities that can be used during annotation.
Relying on exiting databases is problematic as considering the large combinatorial space of molecular
arrangements, there are many biologically relevant compounds not catalogued in databases or documented in
the literature. A secondary yet important challenge is interpreting the measurements to understand the metabolic
activity of the sample under study. Current techniques are limited in utilizing complex information about the
sample to elucidate metabolic activity.
The goal of this project is to develop computational techniques to advance the interpretation of large-scale
metabolomics measurements. To address current challenges, we propose to pursue three Aims: (1) Engineering
candidate sets that enhance biological discovery. (2) Developing new techniques for annotation including using
deep learning and incremental build out methods to recommend novel chemical structures that best explain the
measurements. (3) Constructing probabilistic models to analyze metabolic activity. Each technique will be
rigorously validated computationally and experimentally using chemical standards. Two detailed case studies on
the intestinal microbiota will allow us to further validate our tools. Microbiota-derived metabolites have been
detected in circulation and shown to engage host cellular pathways in organs and tissues beyond the digestive
system. Identifying these metabolites is thus critical for understanding the metabolic function of the microbiota
and elucidating their mechanisms. The complex test cases will challenge our techniques, provide feedback
during development, and allow us to further disseminate our techniques. We will work closely with early adopters
of our tools, as proposed in supporting letters, to further validate our tools and encourage wide adoption. All
proposed tools will be open source and made accessible through the web. Our tools promise to change current
practices in interpreting metabolomics data beyond what is currently possible with databases, current annotation
tools, statistical and overrepresentation analysis, or combinations thereof. The use of machine learning and large
data sets as proposed herein defines the most promising research direction in metabolomi...

## Key facts

- **NIH application ID:** 10242075
- **Project number:** 5R01GM132391-03
- **Recipient organization:** TUFTS UNIVERSITY MEDFORD
- **Principal Investigator:** Soha Hassoun
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $372,055
- **Award type:** 5
- **Project period:** 2019-09-23 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10242075

## Citation

> US National Institutes of Health, RePORTER application 10242075, Computational Techniques for Advancing Untargeted Metabolomics Analysis (5R01GM132391-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10242075. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
