Single-molecule protein sequencing by iterative isolation and identification of N-terminal amino acids

NIH RePORTER · NIH · R43 · $409,295 · view on reporter.nih.gov ↗

Abstract

SUMMARY Proteins are responsible for much of the structure and function of all cells. Subtle changes in expression of various protein forms are critical for proper growth and development, but irregularities can cause deleterious cellular effects or large-scale biological dysfunction. Proteins consist of chains of amino acids, which ultimately determine the three-dimensional structure and functionality of the protein. As such, the ability to gather the entire amino acid sequence of low abundance proteins can greatly accelerate research into protein function and biol- ogy. However, in stark contrast to the relative success of DNA sequencing technologies, there is currently no efficient and cost-effective strategy to sequence single protein molecules at single-amino-acid resolution. Two methods are commercially available for protein sequencing. The first method, “Edman degradation”, re- quires purification of the target protein. Bulk quantities of whole protein or purified fragments are sequenced by cleaving off the first (N-terminal) amino acid and chemically identifying it. The second method, based on mass spectrometry, requires enzymatically degrading a single protein or mixture of proteins into small fragments, then analyzing the molecular mass and charge of each fragment. This information is compared to that of known protein sequences to infer the identity of the input proteins. Both of these commercially available methods suffer from low sensitivity, requiring ~1 million molecules of each protein for detection. Edman degradation cannot currently be used in heterogenous protein mixtures, further limiting its utility. Critical hurdles in single molecule protein sequencing are the number and diversity of amino acids, as well as the interactions between amino acids that interfere with reagents that can identify amino acids by their chemical side chains. Current approaches being developed for single-molecule protein sequencing could avoid some of these issues by employing harsh denaturation agents, but these can compromise the identification systems themselves. In addition, denaturation agents only remove some of the intramolecular interactions of proteins. Glyphic Biotechnologies has developed a novel strategy to iteratively identify the first (N-terminal) amino acid by isolating it from the remainder of the protein, using a linker molecule called ClickP. After binding the protein to a solid surface, ClickP enables single molecule protein sequencing by a reiterative method of physically iso- lating the terminal amino acid, then enabling its identification at high specificity and single-molecule sensitivity. The approach has the potential to be scaled to sequence millions to billions of single molecules simultaneously in hours. Developing this technology will revolutionize protein analysis by making large-scale protein sequenc- ing feasible, inexpensive, and routine.

Key facts

NIH application ID: 10498892
Project number: 1R43HG012563-01
Recipient: GLYPHIC BIOTECHNOLOGIES, INC.
Principal Investigator: Daniel Masao Estandian
Activity code: R43
Funding institute: NIH
Fiscal year: 2022
Award amount: $409,295
Award type: 1
Project period: 2022-09-02 → 2023-08-31