# Methods for Integrative Genomic Data Analysis

> **NIH NIH R01** · UNIVERSITY OF PENNSYLVANIA · 2021 · $430,814

## Abstract

Abstract
 The broad, long-term objective of this project concerns the development of novel statistical methods, theory
and computational tools for statistical modeling of large-scale multiple high-dimensional genomic data motivated
by important biological questions and experiments. New high-throughput technologies and next generation sequencing are generating various types of very high-dimensional genetics, genomic, epigenomics, metabolomics data
in order to obtain an integrative understanding of various complex phenotypes. As the types and complexity of
the data increase and as the questions being addressed become more sophisticated, statistical methods that can
both integrate these genomic data and incorporate information about gene function and pathways are required in
order to draw valid statistical and biological inferences. The specific aims of the current project are to develop new
statistical models and methods for causal integrative analysis of eQTL data with genome wide genetic association
data (GWAS) in order to identify the possible causal genes and pathways for disease phenotypes. Motivated by
analysis of diverse genomic data, the first aim is to develop novel causal mediation analysis methods to identify the
genes that mediate the effects of genetic variants on disease phenotypes by constructing gene regulatory networks
based on eQTL data. Aim 2 is to develop high-dimensional instrumental variables (HDIV) regression models in
order to identify the phenotype-causing genes using eQTLs as possible instrumental variables. Aims 3 develops
methods for estimating the genetic relatedness between disease phenotype and gene expressions in order to identify the possible disease causing genes and biological pathways. Finally, Aim 4 is to develop statistical methods
that can effectively integrate GTEx data with GWAS association summary statistics in order to identify possible
causal disease genes and pathways. These methods hinge on novel integration of methods for multiple related
high-dimensional regressions and high-dimensional causal inference. The new methods can be applied to different
types of genomic data and will ideally help facilitate the identification of genes and their complex interactions as
well as the biological pathways underlying various complex human diseases. The work proposed here will contribute statistical methodology and theory for modeling high-dimensional genomic data and to studying complex
phenotypes and biological systems and o er insights into each of the biological areas represented by the various
data sets, including Alzheimer's disease, cardiometabolic syndrome, and chronic kidney disease. All algorithms
and software tools developed under this grant and detailed documentation will be made available free-of-charge to
interested researchers.

## Key facts

- **NIH application ID:** 10188561
- **Project number:** 5R01GM129781-04
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Hongzhe Lee
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $430,814
- **Award type:** 5
- **Project period:** 2018-09-01 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10188561

## Citation

> US National Institutes of Health, RePORTER application 10188561, Methods for Integrative Genomic Data Analysis (5R01GM129781-04). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/10188561. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
