# Defining the modular architecture of protein intrinsically disordered regions for a predictive understanding of biological function

> **NIH NIH R01** · UT SOUTHWESTERN MEDICAL CENTER · 2024 · $337,949

## Abstract

PROJECT SUMMARY/ABSTRACT
Protein sequences can be broadly categorized into two classes: those which adopt stable secondary structure
and fold into a domain (i.e., globular proteins), and those that do not. This latter class of sequences are
conformationally heterogeneous and are described as being intrinsically disordered. Structural biology has
enabled the development of bioinformatic approaches that can sub-classify globular sequences by domain type,
an approach that has revolutionized how we understand and predict protein functionality. Conversely, it is
unknown if protein intrinsically disordered regions (IDRs), which cannot be resolved by structural biology, are
subject to broadly generalizable organizational principles that would enable their sub-classification and the a
priori prediction of function.
Protein Low Complexity Domains (LCDs) are a class of IDRs enriched in a small subset of amino acids. By
simply gazing at LCDs, local sequence biases are evident. We hypothesized that the non-uniform distribution of
amino acids with LCDs may be a conspicuous manifestation of a more general organizational principle widely
operative in IDRs. We therefore developed a statistical approach that quantifies linear variance in amino acid
composition across a sequence. This algorithm has led to the surprisingly discovery that IDRs are non-randomly
organized into juxtaposed modules of distinct compositional bias. This type of sequence organization is present
across the three domains of life and in IDRs of both low and high sequence complexity. Our data show that this
sequence organizational principle is broadly operative and suggest a hitherto unappreciated level of logic and
interpretability in this enigmatic class of sequences. Motivated by these observations, this proposal seeks to use
the logic of modularity to comprehensively classify IDRs, develop a predictive understanding of IDR function,
and define how the modular architecture of disordered sequences impacts their conformation and function.
In Aim 1 we will use computation and quantitative metrics to categorically cluster modules in order to assess
both the evolutionary diversification of IDRs and to relate module types (and thus IDRs) with specific functions.
In Aim 2 and Aim 3 we will undertake in vitro and in vivo functional studies targeting two model disordered
sequences. Specifically, we will determine how the unique modular structure of a DNA-binding IDR and a
desiccation-tolerant IDR relates to their in vitro conformation and function using biochemical assays and related
in cellula approaches. Altogether, these studies will provide the first comprehensive classification of IDRs to
enable a priori functional predictions for disordered sequences, and the proposed functional studies will delineate
how modularity impacts conformation and function. Altogether, this work will usher in a new level of interpretability
and predictability in our understanding of the functional mechanism of IDRs.

## Key facts

- **NIH application ID:** 10940573
- **Project number:** 1R01GM155100-01
- **Recipient organization:** UT SOUTHWESTERN MEDICAL CENTER
- **Principal Investigator:** Matthew W. Parker
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $337,949
- **Award type:** 1
- **Project period:** 2024-09-01 → 2029-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10940573

## Citation

> US National Institutes of Health, RePORTER application 10940573, Defining the modular architecture of protein intrinsically disordered regions for a predictive understanding of biological function (1R01GM155100-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10940573. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
