Cross Repository Metabolomics Data and Workflow Integration

NIH RePORTER · NIH · R03 · $302,764 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract The lack of uniformity in published experimental methods and data is a major impediment for the research community to compare, corroborate, and build upon biomedical discoveries. The FAIR data principles state that research data should be “findable, accessible, interoperable, and reusable.” Public metabolomics data repositories and large-scale studies supported by the NIH Common Fund, including Metabolomics Workbench and the Integrated Human Microbiome Project (iHMP), and other public mass spectrometry data repositories, such as the Global Natural Products Social Molecular Networking (GNPS) and MetaboLights, have made progress in recent years to address the first two FAIR principles by making metabolomics data easily findable and accessible. Unfortunately, the final two FAIR principles, which state that data should be interoperable and reusable, have not been adequately addressed yet by the metabolomics community. This prevents metabolomics data from multiple relevant studies to be compared and co-analyzed. This proposal aims to bridge this interoperability and reusability gap by harmonizing community standards and creating accompanying computational tools for data re-analysis. Specifically, this proposal will 1. Standardize and convert mass spectrometry data formats (Aim 1), 2. Harmonize experimental metadata and analysis results with common controlled vocabulary with consistent semantics across all experiments (Aim 1), 3. Develop web infrastructure to find and explore datasets by metadata (Aim 1), and 4. Develop cloud-enabled portable, reusable, and scalable co-analysis bioinformatics pipelines (Aim 2). Successful completion of these aims will democratize the ability for the entire metabolomics community to corroborate published findings, discover new metabolites that are highlighted only when co-analyzing datasets, and test translational hypotheses across different model organisms.

Key facts

NIH application ID
10576731
Project number
1R03OD034493-01
Recipient
UNIVERSITY OF CALIFORNIA, SAN DIEGO
Principal Investigator
PIETER C DORRESTEIN
Activity code
R03
Funding institute
NIH
Fiscal year
2022
Award amount
$302,764
Award type
1
Project period
2022-09-20 → 2024-09-19