# Cross-Platform and Graphical Software Tool for Adaptive LC/MS and GC/MS Metabolomics Data Preprocessing

> **NIH NIH U01** · UNIVERSITY OF NORTH CAROLINA CHARLOTTE · 2021 · $324,528

## Abstract

Project Summary / Abstract
Data preprocessing is critical for the success of any MS-based untargeted metabolomics study, as it is the first
informatics step for making sense of the data. Despite the enormous contributions that existing software tools
have made to metabolomics, errors in compound identification and relative quantitation are still plaguing the field.
This issue is becoming more serious as the sensitivity of LC/MS and GC/MS platforms is constantly increasing.
Preprocessing involves peak detection, peak grouping and annotation for LC/MS or spectral deconvolution for
GC/MS data, and peak alignment. Existing software tools invariably yield an immense number of false positive
and false negative peaks, produce inaccurate peak groups, mis-align detected peaks, and extract inaccurate
information of relative metabolite quantitation. These errors can translate downstream into spurious or missing
compound identifications and cause misleading interpretations of the metabolome. Furthermore, users need to
specify a large number of parameters for existing software tools to work. Unfortunately, general users usually
do not understand how to optimize these parameters, and maximizing one aspect (e.g., sensitivity) often has
deleterious effects on another (e.g., specificity). We will address these challenges by developing more accurate
algorithms for improving the rigor and reproducibility of data preprocessing. The proposed algorithms will be
implemented in Java and integrated with the widely-used MZmine 2, making the software cross-platform and
user-friendly with rich visualization capabilities. In addition, the implementation will be optimized for memory
efficiency and computing speed allowing large-scale data preprocessing. Extensive testing of the software will be
conducted in close collaborations with metabolomics core facilities and users around the world.

## Key facts

- **NIH application ID:** 10234033
- **Project number:** 5U01CA235507-04
- **Recipient organization:** UNIVERSITY OF NORTH CAROLINA CHARLOTTE
- **Principal Investigator:** Xiuxia Du
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $324,528
- **Award type:** 5
- **Project period:** 2018-09-19 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10234033

## Citation

> US National Institutes of Health, RePORTER application 10234033, Cross-Platform and Graphical Software Tool for Adaptive LC/MS and GC/MS Metabolomics Data Preprocessing (5U01CA235507-04). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10234033. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
