# Cross Repository Metabolomics Data and Workflow Integration

> **NIH NIH R03** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2022 · $302,764

## Abstract

Project Summary/Abstract
The lack of uniformity in published experimental methods and data is a major impediment for the
research community to compare, corroborate, and build upon biomedical discoveries. The FAIR
data principles state that research data should be “findable, accessible, interoperable, and
reusable.” Public metabolomics data repositories and large-scale studies supported by the NIH
Common Fund, including Metabolomics Workbench and the Integrated Human Microbiome
Project (iHMP), and other public mass spectrometry data repositories, such as the Global Natural
Products Social Molecular Networking (GNPS) and MetaboLights, have made progress in recent
years to address the first two FAIR principles by making metabolomics data easily findable and
accessible. Unfortunately, the final two FAIR principles, which state that data should be
interoperable and reusable, have not been adequately addressed yet by the metabolomics
community. This prevents metabolomics data from multiple relevant studies to be compared and
co-analyzed. This proposal aims to bridge this interoperability and reusability gap by harmonizing
community standards and creating accompanying computational tools for data re-analysis.
Specifically, this proposal will 1. Standardize and convert mass spectrometry data formats (Aim
1), 2. Harmonize experimental metadata and analysis results with common controlled vocabulary
with consistent semantics across all experiments (Aim 1), 3. Develop web infrastructure to find
and explore datasets by metadata (Aim 1), and 4. Develop cloud-enabled portable, reusable, and
scalable co-analysis bioinformatics pipelines (Aim 2). Successful completion of these aims will
democratize the ability for the entire metabolomics community to corroborate published findings,
discover new metabolites that are highlighted only when co-analyzing datasets, and test
translational hypotheses across different model organisms.

## Key facts

- **NIH application ID:** 10576731
- **Project number:** 1R03OD034493-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** PIETER C DORRESTEIN
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $302,764
- **Award type:** 1
- **Project period:** 2022-09-20 → 2024-09-19

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10576731

## Citation

> US National Institutes of Health, RePORTER application 10576731, Cross Repository Metabolomics Data and Workflow Integration (1R03OD034493-01). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10576731. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
