Advancing Causal Inference in Integrative Omics Analysis

NIH RePORTER · NIH · R35 · $354,139 · view on reporter.nih.gov ↗

Abstract

Project Summary Emerging and rapidly progressing technologies can now measure the molecular phenotypes of genes, transcripts, proteins, metabolites, and gut microbiota. These omics data provide an unprecedented level of granularity into both clinical and biological measurements, showing great promise to understand biological mechanisms governing human health and disease, and to uncover the underlying hetero- geneities that contribute to disease manifestations. However, many statistical methods used for analysis of omics data only establish associations. These associations may merely represent correlates or con- sequences of disease processes, and thus may not reveal disease mechanisms or guide therapeutics and clinical care. On the other hand, existing causal inference methods are not adequately equipped to handle the high dimensionality, correlation, and complexity of omics data. The goal of this project is to develop new statistical methods for causal inference that integrate large-scale omics data and im- plement them in user-friendly open-source software. We will develop a new framework that broadens the scope of mediation analysis to jointly analyze high-dimensional omics mediators, through novel ap- plications of two powerful ideas in statistics and machine learning: sufficient dimension reduction and variational autoencoders. The proposed framework can identify a disentangled representation of key mediation pathways, effectively distilling vital signals from large-scale omics mediators. Moreover, we will develop robust and scalable multivariable Mendelian randomization methods for large-scale omics measures, and then extend these methods to identify shared risk pathways across multiple outcomes. Lastly, we will introduce a novel framework for testing the pairwise causal directions between two omics modalities (e.g., microbiome and metabolites) by leveraging the asymmetry in temporally-ordered data. To maximize the impact of the proposed methods, we will develop and maintain open-source software for our methods, and integrate our proposed Mendelian randomization methods into two state-of-the-art platforms (MR-Base and MendelianRandomization). This project aims to address the need for robust, rigorous, and computationally efficient causal inference in large-scale omics data, and ultimately trans- form the potential of massive biomedical data into trustworthy, actionable, and generalizable knowledge to solve public health challenges.

Key facts

NIH application ID
10940873
Project number
1R35GM155070-01
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
Ting Ye
Activity code
R35
Funding institute
NIH
Fiscal year
2024
Award amount
$354,139
Award type
1
Project period
2024-09-01 → 2029-07-31