PROJECT SUMMARY Vast numbers of mutations in human protein coding sequences have been identified by DNA sequencing, but for only a tiny percentage of these variants do we understand the biochemical basis for any defect. This project seeks to develop an approach to characterize the biochemical activities – for such activities as thermostability, post-translational modification, kinetics and catalysis – of protein variants at high throughput using the ease and scale of DNA sequencing. Knowledge of biochemical activities can be used to annotate clinically-relevant proteins, many of which have variants that are nearly all either unannotated or annotated as variants of uncertain significance for their effect on pathogenicity. In the proposed approach, each variant will be covalently linked in vivo to a unique RNA barcode by fusion to a tRNA modifying enzyme that recognizes and couples to a short stem-loop RNA sequence. The stem-loop will contain a barcode sequence that identifies the variant. The barcodes allow a pool of variants to be subjected to a biochemical assay, with the results read out by DNA sequencing of the barcodes. By developing this technology to express variant proteins in vivo, we allow these proteins to fold or become post-translationally modified in their native cellular environment; to bind to other cellular proteins or ligands; to become modified upon cellular perturbation; or to be synthesized in a variety of hosts, including human cultured cells. As proof-of-concept examples of the use of this method, in Aim 1, variants of dihydrofolate reductase will be assessed for thermal stability. The variants will be fused to the tRNA modifying enzyme and expressed in E. coli; the fusion proteins will be purified in a pooled format; and aliquots of the proteins will be heated to varying temperatures followed by purification of the soluble, undenatured protein (thermal proteome profiling). The number of sequence reads of the soluble fraction of each variant allows melting temperatures to be determined. In Aim 2, we will develop this method in mammalian cells to quantify the abundance of protein variants of thiopurine methyltransferase. In summary, we will develop a high throughput in vivo protein barcoding method to study fundamental biochemical properties of variant proteins that have so far remained inaccessible by current methods.