# Imputing quantitative mass spectrometry proteomics data using non-negative matrix factorization

> **NIH NIH F31** · UNIVERSITY OF WASHINGTON · 2024 · $39,491

## Abstract

PROJECT SUMMARY/ABSTRACT
Alzheimer's disease (AD) represents an emerging global health threat and is a expected to double in prevalence by
2050. AD is a disease of malformed proteins, and signiﬁcant progress has been made characterizing the AD proteome
with mass spectrometery. However, data missingness represents a signiﬁcant barrier to the interpretation of existing
AD mass spectrometry experiments.
 Missingness refers to peptides or proteins that are present in the biological sample but are not detected by the mass
spectrometer due to various technical factors. This project will address missingness by developing machine learning
methods for imputing, or estimating, missing values in quantitative mass spectrometry data. The project will develop
two separate imputation methods, one using non-negative matrix factorization and the other deep neural networks.
These imputation methods will increase the reproducibility and statistical power of mass spectrometry experiments
and will enable new discoveries in existing proteomics experiments. These imputation methods will be applicable to
virtually any kind of mass spectrometry experiment – tandem mass tag, data dependent acquisition, data independent
acquisition, spectral counts, label-free quantiﬁcation, etc. These imputation methods will be released as lightweight,
open-source and easy-to-use software packages and may be incorporated into existing data processing workﬂows.
 I will demonstrate the utility of these imputation methods by reanalysing data from several existing AD proteomic
studies. My imputation methods will identify novel differentially expressed proteins, co-expression modules and AD
biomarkers in these existing datasets. I will also analyze unpublished data-independent acquisition (DIA) proteomics
data derived from AD patient cerebrospinal ﬂuid samples. Here I will focus on identifying biomarkers that differentiate
between patients based on genetic background and co-morbidity status. I will also identify biomarkers of patients with
asymptomatic AD.
 The imputation methods developed by this proposal will enable future discoveries by independent AD researchers.
This proposal aligns with the NIA Strategic Direction seeking to "identify and understand the genetic, molecular and
cellular mechanisms underlying the pathogenesis of AD."

## Key facts

- **NIH application ID:** 10830923
- **Project number:** 5F31AG082395-02
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Lincoln Jeffery Harris
- **Activity code:** F31 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $39,491
- **Award type:** 5
- **Project period:** 2023-04-16 → 2026-04-15

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10830923

## Citation

> US National Institutes of Health, RePORTER application 10830923, Imputing quantitative mass spectrometry proteomics data using non-negative matrix factorization (5F31AG082395-02). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10830923. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
