Defining the modular architecture of protein intrinsically disordered regions for a predictive understanding of biological function

NIH RePORTER · NIH · R01 · $337,949 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT Protein sequences can be broadly categorized into two classes: those which adopt stable secondary structure and fold into a domain (i.e., globular proteins), and those that do not. This latter class of sequences are conformationally heterogeneous and are described as being intrinsically disordered. Structural biology has enabled the development of bioinformatic approaches that can sub-classify globular sequences by domain type, an approach that has revolutionized how we understand and predict protein functionality. Conversely, it is unknown if protein intrinsically disordered regions (IDRs), which cannot be resolved by structural biology, are subject to broadly generalizable organizational principles that would enable their sub-classification and the a priori prediction of function. Protein Low Complexity Domains (LCDs) are a class of IDRs enriched in a small subset of amino acids. By simply gazing at LCDs, local sequence biases are evident. We hypothesized that the non-uniform distribution of amino acids with LCDs may be a conspicuous manifestation of a more general organizational principle widely operative in IDRs. We therefore developed a statistical approach that quantifies linear variance in amino acid composition across a sequence. This algorithm has led to the surprisingly discovery that IDRs are non-randomly organized into juxtaposed modules of distinct compositional bias. This type of sequence organization is present across the three domains of life and in IDRs of both low and high sequence complexity. Our data show that this sequence organizational principle is broadly operative and suggest a hitherto unappreciated level of logic and interpretability in this enigmatic class of sequences. Motivated by these observations, this proposal seeks to use the logic of modularity to comprehensively classify IDRs, develop a predictive understanding of IDR function, and define how the modular architecture of disordered sequences impacts their conformation and function. In Aim 1 we will use computation and quantitative metrics to categorically cluster modules in order to assess both the evolutionary diversification of IDRs and to relate module types (and thus IDRs) with specific functions. In Aim 2 and Aim 3 we will undertake in vitro and in vivo functional studies targeting two model disordered sequences. Specifically, we will determine how the unique modular structure of a DNA-binding IDR and a desiccation-tolerant IDR relates to their in vitro conformation and function using biochemical assays and related in cellula approaches. Altogether, these studies will provide the first comprehensive classification of IDRs to enable a priori functional predictions for disordered sequences, and the proposed functional studies will delineate how modularity impacts conformation and function. Altogether, this work will usher in a new level of interpretability and predictability in our understanding of the functional mechanism of IDRs.

Key facts

NIH application ID: 10940573
Project number: 1R01GM155100-01
Recipient: UT SOUTHWESTERN MEDICAL CENTER
Principal Investigator: Matthew W. Parker
Activity code: R01
Funding institute: NIH
Fiscal year: 2024
Award amount: $337,949
Award type: 1
Project period: 2024-09-01 → 2029-07-31