Statistical methods for interpretation of genetic variants by gene regulatory networks

NIH RePORTER · NIH · R35 · $71,343 · view on reporter.nih.gov ↗

Abstract

Project Summary A person’s genome typically contains millions of variants which represent the differences between this personal genome and the reference human genome. Interpretation of how these variants cause diseases and understanding the mechanism(s) of their statistical associations to phenotype are crucial problems in computational biology and genetics. The problems are not straightforward to address because over 90% of disease-associated variants are in non-coding regions that have highly specific cellular context regulatory functions and about which we have limited comprehension. The long-term goal of this project is to explain mechanistically how non-coding genetic variants affect cellular context-dependent gene regulatory networks and influence phenotypes. Expression quantitative trait locus (eQTL) mapping and Gene regulatory networks (GRNs) are two common approaches for interpreting regulatory mechanisms of genetic variants. eQTL mapping connects variants in non-coding regions to genes by a population-based association study. GRNs provide information on the cis-regulatory elements that control context-specific expression of target genes, and information about the transcription factors that act on these elements. GRN-based variant interpretation is complementary to eQTL mapping and has the potential to overcome the limitations of eQTL mapping, which are: (1) eQTL mapping is biased for common alleles; (2) eQTL mapping cannot distinguish variants in strong linkage disequilibrium; and (3) the power to detect trans-eQTL is low. Most previous regulatory analysis research based on ENCODE data did not include personal genotyping data, and most eQTL mapping research did not include regulatory information. Joint modelling of eQTLs and GRNs would enable high-accuracy and mechanistic variant interpretation. However, the required dataset for such analysis - matched gene expression, epigenome, and genotyping data from the same individuals - are not available for a large human sample. Available datasets are cross-individual paired genotyping and gene expression data, such as GTEx data, and cross-cellular-contexts paired gene expression and epigenomics data, such as ENCODE data. These two types of paired data are also available at the single cell level. To achieve our long-term goal, we will develop statistical methods to integrate these unmatched datasets (either bulk or single cell) from different sources to (1) infer high accuracy context-specific GRNs to connect variants, transcription factors, cis-regulatory elements, and target genes; and (2) detect trans-eQTLs that regulate target genes. These methods can be extended to interpret disease-associated variants, identify causal variants, and infer personalized drug response to provide guidance for precision medicine. This project is fundamental for precision medicine, and it will increase our understanding of how genetic variants contribute to phenotype.

Key facts

NIH application ID
10925310
Project number
5R35GM150513-02
Recipient
CLEMSON UNIVERSITY
Principal Investigator
Zhana Duren
Activity code
R35
Funding institute
NIH
Fiscal year
2024
Award amount
$71,343
Award type
5
Project period
2023-09-08 → 2024-10-31