Uncovering the Human Secretome

NIH RePORTER · NIH · DP1 · $1,186,500 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY / ABSTRACT Peptide hormones regulate embryonic development and most physiological processes by acting as endocrine or paracrine signals. They are also a rich source of relatively safe medicines to treat both common and rare diseases. Yet finding peptide-coding genes below ~300 base pairs is inherently difficult because they lie within the noise of the genome. Recent multidisciplinary, proteophylogenomic studies in lower species, such as yeast and flies, have uncovered hundreds of new small protein-coding genes called “smORFs”. In humans, recent work on the mitochondrial genome has also uncovered dozens of small peptide hormone genes called MDPs. Based on these and other studies, it is estimated that about 5% of proteins in the human nuclear genome have not yet been discovered, particularly those that encode small peptides below 100 amino acids. It is a well documented but rarely challenged practice to discard large quantities of sequencing and proteomic data because they do not match the annotated human genome. My overarching goal is to discover the human “secretome” and make practical use of it to improve the human condition. Over the past few years, we have developed a unique pipeline of technologies that combines breakthroughs in math, computer hardware and software, proteomics, mass spectrometry, and HTS screening, each of which has been optimized and integrated. Our GeneFinder software modules, based on machine-learning, can process data 100 times faster than traditional methods and rapidly validate small human genes using public and in-house generated databases of genetic and proteomic data. Using the prototype version of the platform that finds conservation between humans, chimp, and macaque, we have discovered thousands of putative peptide-coding genes and validated hundreds of them. We aim to (1) further improve the algorithm to increase its speed and accuracy, (2) improve the genome annotation for thousands of small novel genes, (3) determine their expression profiles in normal and diseased tissues, (4) explore their genetic association with disease loci, and (5) screen the first secretomic library to find hormones with novel biological and therapeutically relevant activities. The data, the software package, and libraries will be made available to the research community. In doing so, we will shed light on the dark matter of the human genome, the parts with the greatest therapeutic potential, thereby helping to steer and accelerate the pace of research and drug development for generations to come.

Key facts

NIH application ID
9928344
Project number
5DP1AG058605-04
Recipient
HARVARD MEDICAL SCHOOL
Principal Investigator
DAVID A. SINCLAIR
Activity code
DP1
Funding institute
NIH
Fiscal year
2020
Award amount
$1,186,500
Award type
5
Project period
2017-09-15 → 2022-05-31