# ConProject-001

> **NIH NIH R01** · CORNELL UNIVERSITY · 2021 · $357,363

## Abstract

Complex biological processes, including organ development, immune response and disease progression,
are inherently dynamic. Learning their regulatory architecture requires understanding how
components of a large system dynamically interact with each other and give rise to emergent behavior.
Recent experimental advances have made ii possible to investigate these biological systems in a
data-driven fashion al high temporal resolution, allowing identification of new genes and their regulatory
interactions. Longitudinal omics data sets are becoming increasingly common in clinical practice
as well. Information on these collections of interacting genes can be integrated to gain systems-level
insights into the roles of biological pathways and processes, including progression of diseases. Consequently,
developing interpretable methods for learning functional relationships among genes, proteins
or metabolites from high-dimensional time series data has become a timely research problem.
The nature of these time-course data sets presents exciting opportunities and interesting challenges
from a statistical perspective. Typical time-course omics data sets are challenging because of
their high-dimensionality and non-linear relationships among system components. To tackle these challenges,
one needs sophisticated dimension-reduction techniques that are biologically meaningful, computationally
efficient and allow uncertainty quantification. Methods that incorporate prior biological
information (e.g., pathway membership, protein-protein interactions) into the data analysis are good
candidates for analyzing such high-dimensional systems using small samples.
Here, we will develop three core methods to address the above challenges - (Aim 1): an empirical
Bayes framework for clustering high-dimensional omics time-course data using prior biological knowledge;
(Aim 2): a quantile-based Granger causality framework for learning interactions among genes
or metabolites from their lead-lag relationships; and (Aim 3): a decision tree ensemble framework for
searching cascades of interactions among genes from their temporal expression profiles. Our interdisciplinary
team of statisticians and scientists will analyze time-course ornics data from three research
projects: (i) innate immune response systems in Drosophila, (ii) developmental process in mouse models,
and (ii) longitudinal metabolite profiling of TB patients. These insights will be used to build and
validate our methodology, which will be implemented in a publicly available software. This proposal is
innovative in its incorporation of prior biological knowledge in the framework of novel dimension reduction
techniques for interrogating high-dimensional time-course omics data. This research is significant in
that it will impact basic sciences by elucidating data-driven, testable hypotheses on the regulatory architecture
of biological processes, and clinical practice by monitoring disease progression and prognosis.

## Key facts

- **NIH application ID:** 10242092
- **Project number:** 5R01GM135926-03
- **Recipient organization:** CORNELL UNIVERSITY
- **Principal Investigator:** Sumanta Basu
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $357,363
- **Award type:** 5
- **Project period:** 2019-09-23 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10242092

## Citation

> US National Institutes of Health, RePORTER application 10242092, ConProject-001 (5R01GM135926-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10242092. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
