# Systematic Discovery and Analysis of Small Proteins and Small ORFs in Mycobacteria

> **NIH NIH R01** · WADSWORTH CENTER · 2021 · $561,892

## Abstract

The last few decades have seen the birth and maturation of the field of Molecular Biology. Initially, mutant
genes were focal points of genome exploration. Now, entire genomes are routinely sequenced, and the
resident genes are automatically identified by annotation algorithms. Alternatively, proteomic approaches
prepare proteolytic peptides of whole-cell extracts for analysis by mass spectrometry. Each of these
approaches are strongly biased for large genes: large genes are frequent targets for mutation, long-open-
reading frames are easily discerned in genomic sequence, and large proteins generate many peptides
for mass spectrometry identification. This unintended bias has also created a large gap in our
understanding of molecular biology.
Recent work in eukaryotes and prokaryotes alike have uncovered multitudes of small genes or their
encoded proteins. The numbers of small proteins (considered as 50 aa or less) rival that of traditionally
large proteins, yet only a handful have been ascribed a function. The goal of this proposal is to propel
this nascent field forward by facilitating both small protein discovery and functional characterization. Our
preliminary data identify specific examples that clearly define cis- and trans-classes of function for short-
open-reading frames and small proteins. These early leads will be pursued to fruition, providing the
framework for the expanded rigorous study needed in any new field. We will test an additional subset of
small proteins for function, which we anticipate will reveal functions for each member of this training set,
while also establishing general principles for short-open-reading frames and small proteins.
We will develop and apply our small protein approaches in mycobacteria. Mycobacteria offer many
advantages for small protein study. Foremost is that they express >1000 small proteins in standard
conditions. An extensive toolkit for modifying, culturing, and analyzing mycobacteria makes them very
tractable. A GC-rich genome provides codon bias selection as one criterion to identify functional small
proteins. Moreover, our findings of small gene/protein function in standard laboratory conditions may
directly provide insights into the biology and pathogenesis of infection. This proposal integrates the
complementary expertise of investigators whose ongoing collaboration has already provided the requisite
groundwork leading to this proposal. Through the proposed Aims, we will identify new functional roles of
encoded mycobacterial small proteins and develop an optimized, small-proteomics pipeline for efficient
application to other bacteria, archaea, and eukaryotes.

## Key facts

- **NIH application ID:** 10221007
- **Project number:** 5R01GM139277-02
- **Recipient organization:** WADSWORTH CENTER
- **Principal Investigator:** KEITH M DERBYSHIRE
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $561,892
- **Award type:** 5
- **Project period:** 2020-08-01 → 2024-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10221007

## Citation

> US National Institutes of Health, RePORTER application 10221007, Systematic Discovery and Analysis of Small Proteins and Small ORFs in Mycobacteria (5R01GM139277-02). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10221007. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
