# Machine Learning Tools for Discovery and Analysis of Active Metabolic Pathways

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2020 · $336,869

## Abstract

﻿   
DESCRIPTION (provided by applicant): This project aims to develop new statistical machine learning methods for metabolomics data from diverse platforms, including targeted and unbiased/global mass spectrometry (MS), labeled MS experiments for measuring metabolic ﬂux and Nuclear Magnetic Resonance (NMR) platforms. Unbiased MS and NMR proﬁling studies result in identifying a large number of unnamed spectra, which cannot be directly matched to known metabolites and are hence often discarded in downstream analyses. The ﬁrst aim develops a novel kernel penalized regression method for analysis of data from unbiased proﬁling studies. It provides a systematic framework for extracting the relevant information from
unnamed spectra through a kernel that highlights the similarities and differences between samples, and in turn boosts the signal from named metabolites. This results in improved power in identiﬁcation of named metabolites associated with the phenotype of interest, as well as improved prediction accuracy. An extension of this kernel-based framework is also proposed to allow for systematic integration of metabolomics data from diverse proﬁling studies, e.g. targeted and unbiased MS proﬁling technologies. The second aim pro- vides a formal inference framework for kernel penalized regression and thus complements the discovery phase of the ﬁrst aim. The third aim focuses on metabolic pathway enrichment analysis that tests both orchestrated changes in activities of steady state metabolites in a given pathway, as well as aberrations in the mechanisms of metabolic reactions. The fourth aim of the project provides a uniﬁed framework for network-based integrative analysis of static (based on mass spectrometry) and dynamic (based on metabolic ﬂux) metabolomics measurements, thus providing an integrated view of the metabolome and the ﬂuxome. Finally, the last aim implements the pro- posed methods in easy-to-use open-source software leveraging the R language, the capabilities of the Cytoscape platform and the Galaxy workﬂow system, thus providing an expandable platform for further developments in the area of metabolomics. The proposed software tool will also provide a plug-in to the Data Repository and Coordination Center (DRCC) data sets, where all regional metabolomics centers supported by the NIH Common Funds Metabolomics Program deposit curated data.

## Key facts

- **NIH application ID:** 9899255
- **Project number:** 5R01GM114029-05
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** ALI SHOJAIE
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $336,869
- **Award type:** 5
- **Project period:** 2016-04-01 → 2022-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9899255

## Citation

> US National Institutes of Health, RePORTER application 9899255, Machine Learning Tools for Discovery and Analysis of Active Metabolic Pathways (5R01GM114029-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9899255. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
