# Feature Engineering to Infer Proteomic Changes from mRNA Data

> **NIH NIH R21** · UNIVERSITY OF COLORADO DENVER · 2024 · $435,406

## Abstract

PROJECT SUMMARY
Proteins are the molecules that carry out the majority of biological function. Although mRNA levels can be
measured at scale and have been transformative for understanding gene expression in large cohorts, mRNA
levels correlate only partially with protein levels in a system. As a result, some differentially expressed genes
from transcriptomics experiments may not be informative for the abundance of their proteins, leaving their
functional significance difficult to interpret and leading to a loss of information that impedes the translation of
`omics' experiments to biological knowledge. The recent availability of large matching transcriptomics and
proteomics data has created new avenues to predict protein level changes from mRNA profiles using machine
learning methods. Results from these efforts have highlighted the prevalence of post-transcriptional regulation
of the proteome, where the abundance of a protein species in a sample is often determined not only by its own
coding mRNA, but the abundance of other mRNAs in the transcriptomes, including many of those coding for
its protein-protein interaction partners. Accordingly, this project aims to explore new strategies that capture
protein-protein relationships to enhance our current capability to infer protein-level changes from mRNA
abundance measurements. Specifically, Aim 1 will explore the use of conceptual embeddings of proteins to
create low-dimension vectors that capture relevant protein information on: (1) the topology of protein-protein
interaction network measured in large mass spectrometry experiments, and (2) protein sequence, domain, and
structure information; and then evaluate their utility for capturing the relevant protein neighborhoods that aid in
the prediction of proteomic changes from mRNA abundance. In parallel, Aim 2 will aim to disseminate
technological advances by building enabling software tools and web apps that will take the pre-trained models
to analyze new user input mRNA sequencing results, which are designed to assist in the prioritization and
interpretation of gene lists from sequencing experiments. The models will be validated by mass spectrometry
and immunoblot experiments. If successful, the proposed work will lead to broadly applicable software tools
that can enhance the utility and interpretation of transcriptomics and proteomics experiments. It may also yield
new insights into the biological factors that contribute to non-correlation between mRNA and proteins.

## Key facts

- **NIH application ID:** 10949467
- **Project number:** 1R21HG013684-01
- **Recipient organization:** UNIVERSITY OF COLORADO DENVER
- **Principal Investigator:** Edward Lau
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $435,406
- **Award type:** 1
- **Project period:** 2024-09-23 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10949467

## Citation

> US National Institutes of Health, RePORTER application 10949467, Feature Engineering to Infer Proteomic Changes from mRNA Data (1R21HG013684-01). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10949467. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*