# Addressing Sparsity in Metabolomics Data Analysis

> **NIH NIH U01** · UNIVERSITY OF COLORADO DENVER · 2021 · $365,125

## Abstract

Project Summary
Comprehensive profiling of the small molecule repertoire in a sample is referred to as metabolomics, and is being
used to address a variety of scientific questions in biomedical studies. Metabolomics offers more immediate
measures of the physiology of an individual, and more direct examination of the effects of exposures such as
nutrition, smoking and bacterial infections. For human health, metabolomics studies are being used to investigate
disease mechanisms, discover biomarkers, diagnose disease, and monitor treatment responses. Metabolomics
is increasingly recognized as an important component of precision medicine initiatives to complement and
enhance collected genomic data. This is critical as the metabolome cannot be predicted from knowledge of the
genome, transcriptome or proteome, but provides important information on the phenotype. Recent technological
advances in mass spectrometry-based metabolomics have allowed for more comprehensive and sensitive
measurements of metabolites. We focus on untargeted ultra-high pressure liquid chromatography coupled to
mass spectrometry, which is one of the more commonly used methods. Despite the technological advances,
the bottleneck for taking full advantage of metabolomics data is often the paucity and incompleteness of
analytical tools and databases. Our goal is to develop novel statistical methods and software for the research
community to improve the utilization of metabolomics data. There are many steps in a metabolomics data
analysis pipeline, and we will focus on the downstream steps of normalization, and univariate, multivariate and
pathway analyses. In particular, we will address the high levels of sparsity, which is one of the more unique
aspects of metabolomics data compared to other –omics data sets. For metabolomics data, there is sparsity in
individual metabolites due to a large percentage of missing data for biological or technical reasons, and sparsity
in connections between metabolites due to high collinearity and sparsely connected networks in metabolic
pathways. The methods and software we develop will maximize the potential of metabolomics to provide new
discoveries in disease etiology, diagnosis, and drug development.

## Key facts

- **NIH application ID:** 10252042
- **Project number:** 5U01CA235488-04
- **Recipient organization:** UNIVERSITY OF COLORADO DENVER
- **Principal Investigator:** Debashis Ghosh
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $365,125
- **Award type:** 5
- **Project period:** 2018-09-18 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10252042

## Citation

> US National Institutes of Health, RePORTER application 10252042, Addressing Sparsity in Metabolomics Data Analysis (5U01CA235488-04). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10252042. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
