Machine Learning Tools for Discovery and Analysis of Active Metabolic Pathways

NIH RePORTER · NIH · R01 · $336,869 · view on reporter.nih.gov ↗

Abstract

 DESCRIPTION (provided by applicant): This project aims to develop new statistical machine learning methods for metabolomics data from diverse platforms, including targeted and unbiased/global mass spectrometry (MS), labeled MS experiments for measuring metabolic flux and Nuclear Magnetic Resonance (NMR) platforms. Unbiased MS and NMR profiling studies result in identifying a large number of unnamed spectra, which cannot be directly matched to known metabolites and are hence often discarded in downstream analyses. The first aim develops a novel kernel penalized regression method for analysis of data from unbiased profiling studies. It provides a systematic framework for extracting the relevant information from unnamed spectra through a kernel that highlights the similarities and differences between samples, and in turn boosts the signal from named metabolites. This results in improved power in identification of named metabolites associated with the phenotype of interest, as well as improved prediction accuracy. An extension of this kernel-based framework is also proposed to allow for systematic integration of metabolomics data from diverse profiling studies, e.g. targeted and unbiased MS profiling technologies. The second aim pro- vides a formal inference framework for kernel penalized regression and thus complements the discovery phase of the first aim. The third aim focuses on metabolic pathway enrichment analysis that tests both orchestrated changes in activities of steady state metabolites in a given pathway, as well as aberrations in the mechanisms of metabolic reactions. The fourth aim of the project provides a unified framework for network-based integrative analysis of static (based on mass spectrometry) and dynamic (based on metabolic flux) metabolomics measurements, thus providing an integrated view of the metabolome and the fluxome. Finally, the last aim implements the pro- posed methods in easy-to-use open-source software leveraging the R language, the capabilities of the Cytoscape platform and the Galaxy workflow system, thus providing an expandable platform for further developments in the area of metabolomics. The proposed software tool will also provide a plug-in to the Data Repository and Coordination Center (DRCC) data sets, where all regional metabolomics centers supported by the NIH Common Funds Metabolomics Program deposit curated data.

Key facts

NIH application ID
9899255
Project number
5R01GM114029-05
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
ALI SHOJAIE
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$336,869
Award type
5
Project period
2016-04-01 → 2022-03-31