# Methods and Software for Large-Scale Gene-Environment Interaction Studies

> **NIH NIH R01** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2021 · $796,568

## Abstract

PROJECT SUMMARY/ABSTRACT
Complex human diseases and related quantitative traits are the interplay of many risk factors, including genetic
and environmental components. Gene-environment interaction studies are a general framework that can be used
to identify genetic variations that modify environmental, physiological, lifestyle, or treatment effects, as well as
those contributing to age, sex, racial/ethnic disparities on complex traits. Moreover, genetic association studies
accounting for gene-environment interactions are conducted to enhance our understandings on the genetic
architecture of complex diseases by allowing for different genetic effects in different exposure strata. With the
recent advances in technology and lowering costs, genetic and genomic data are being generated on very large
scales. However, commonly used statistical software programs for gene-environment interaction studies were
generally developed many years ago, and their computational algorithms have not been optimized to analyze
hundreds of thousands to millions of samples from possibly complex study designs. To fill in the gap between
current and future analytical needs in large-scale gene-environment interaction studies and current analytical
solutions, we plan to (Aim 1) develop efficient algorithms for common variant gene-environment interaction
analyses that scale linearly with the sample size; (Aim 2) develop new statistical tests for rare variant gene-
environment interaction analyses, in the mixed effects model framework for correlated samples; and (Aim 3)
implement proposed statistical methods and computational algorithms in open-source new software programs.
Our Aim 1 addresses current computational challenges in conducting gene-environment interaction studies in
up to millions of samples. In Aim 2, we plan to solve statistical and computational challenges in gene-environment
interaction analyses of large-scale whole genome sequencing data, accounting for relatedness, complex study
designs, as well as model misspecification. Aim 3 focuses on software development and we will deliver well-
documented and user-friendly software packages and analysis pipelines for large-scale gene-environment
interaction studies. The methods and software programs will be applied to ongoing whole genome sequencing
projects, as well as biobank-scale data, and they will significantly facilitate the use of large-scale genetic and
genomic data for gene-environment interaction studies in upcoming years to better understand the genetic basis
of complex cardio-metabolic, lung, blood, sleep diseases and their age, sex, racial/ethnic disparities, and
promote personalized disease prevention and treatment strategies in precision health research.

## Key facts

- **NIH application ID:** 10199014
- **Project number:** 5R01HL145025-03
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** Han Chen
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $796,568
- **Award type:** 5
- **Project period:** 2019-07-15 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10199014

## Citation

> US National Institutes of Health, RePORTER application 10199014, Methods and Software for Large-Scale Gene-Environment Interaction Studies (5R01HL145025-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10199014. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
