# Statistical methods for higher order dependences to understand protein functions

> **NIH NIH R01** · COLORADO STATE UNIVERSITY · 2021 · $230,993

## Abstract

This proposal brings together a strong team from molecular science and statistics to tackle the important
problem of how to integrate protein structure and sequence information in complex systems. Some of the
most important characteristics of these data are the strong correlations buried within them, with the
pairwise correlations in the sequence data already being routinely used to predict structural contacts. Here,
we are developing novel ways to use huge data sets to extract higher-order dependences, which are now
possible with the availability of the large volumes of sequence data from genomics; and in addition, in the
molecular structures such higher-order dependences are directly observable in the protein structures where
groups of amino acids interact directly. Importantly, these higher-order dependences reflect the dense
physical environment in the cell that requires for proper statistical characterization. A new model free
information-theoretic measure is introduced to quantify the higher-order dependences, which serves as the
central method in this project. By identifying the major challenges in drawing statistical inference based on
this measure, we develop, evaluate, and improve a new statistical inference and computational framework
for analyses of higher-order dependences with discrete data of a general type, motivated by the protein
multiple sequence data. The new computationally efficient framework makes it possible to discover reliable
higher-order dependences with the ability of quantifying uncertainty. The preliminary data here combine the
information from sequences and structures to yield unexpected results that immediately relate to the
dynamics of the protein structures. The outcome is an entirely new approach to handle the large volumes
of protein sequence data and other omics data now available and the enormous volumes about to arrive on
the doorsteps of omics analysts.

## Key facts

- **NIH application ID:** 10378307
- **Project number:** 1R01GM144961-01
- **Recipient organization:** COLORADO STATE UNIVERSITY
- **Principal Investigator:** Wen Zhou
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $230,993
- **Award type:** 1
- **Project period:** 2021-09-23 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10378307

## Citation

> US National Institutes of Health, RePORTER application 10378307, Statistical methods for higher order dependences to understand protein functions (1R01GM144961-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10378307. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
