A Real-Time AI-Driven High-Throughput Proteomics Data Acquisition Method for Clinical Applications

NIH RePORTER · NIH · R61 · $234,218 · view on reporter.nih.gov ↗

Abstract

SUMMARY Cancer is caused by dynamics of the genome, which ultimately translate into aberrations of the proteome constituting the major functional and structural components of a cell. The proteome comprises a high level of complexity driven by aspects such as post-translational protein modifications, accurately regulated protein degradation, and functional regulation through protein-protein interaction networks. It is also considered the closest molecular link to a biological system’s phenotype. Mass spectrometry is among the most important tools to characterize proteomes, and its versatility is well suited to match the proteome’s complexity. It is, therefore, surprising that the information on understanding and diagnosing cancer provided by the cancer proteome is almost entirely untapped in clinical studies. Among the reasons for this is a lack of sample throughput of mass spectrometry-based proteomics when compared to genomics technologies. This translates into higher analysis costs and reduced access to proteomics. Our overarching aim for this proposal is to develop a novel mass spectrometry-based proteomics data acquisition method that increases sample throughput of deep proteome mapping (>2000 proteins from blood plasma, >8000 proteins from tissue samples) in comparison to current methods by a factor of up to tenfold (10 min per sample). The method is based on multiplexed isobaric proteomics, a barcoding technology that currently allows the simultaneous analyses of up to 18 samples. The novel aspect is the use of artificial intelligence (AI) to drive the data acquisition process. The proteomics community has started to incorporate AI into their workflow for data analysis, but it has not yet been used for improving data acquisition. Our AI system directs the mass spectrometer in real-time to optimize the analysis of globally targeting all proteins assumed to be in a sample at a fast rate. Proteome samples are digested into peptides, and a combination of neural networks trained on millions of mass spectrometry spectra is used to predict in real-time peptide analyte behavior to optimize the analytical speed at high analytical depth. A preliminary version of the method allows mapping 1,300 plasma proteins in 10 min per sample. We propose in Aim 1 further improvement of the method through additional neural networks enabling more sensitive real-time peptide identification and the simultaneous identification of multiple peptides. Our goal is to generate a method that will routinely quantify 2000 proteins from human plasma in 10 minutes. The method will be incorporated into a platform that also includes low-cost automated sample preparation to achieve an overall analysis cost of <$100 per sample. We propose to evaluate the method in Aim 2 by mapping the proteome of 500 clinical plasma proteome samples from lung cancer patients with different pathological cancer stages. Our preliminary data analysis shows a high predictive power of mass spectrometry-b...

Key facts

NIH application ID
10797816
Project number
1R61CA287026-01
Recipient
MASSACHUSETTS GENERAL HOSPITAL
Principal Investigator
Wilhelm Haas
Activity code
R61
Funding institute
NIH
Fiscal year
2024
Award amount
$234,218
Award type
1
Project period
2024-02-12 → 2027-01-31