# Computational Methods for Next-Generation GWAS

> **NIH NIH F32** · UNIVERSITY OF OREGON · 2020 · $19,290

## Abstract

Project Summary/Abstract
 Predicting phenotypes from DNA sequence variation is a major goal for genetics with potential
applications in evolutionary biology, crop breeding, and public health. A central challenge in this task is
separating genetic and environmental effects on phenotypes. In natural populations breeding structure is often
correlated with the environment across space such that different subpopulations experience different
environments. For genome-wide association studies (GWAS) this creates a problem: genetic and
environmental effects can be confounded by population structure, leading to inflated test statistics and low
predictive power across populations (Bulik-Sullivan et al. 2015, Mathieson and Mcvean, 2012). Understanding
when association studies are biased by population stratification and creating better methods to correct for it are
thus important challenges for population genetics over the next decade.
 To identify conditions under which existing methods of population stratification correction are subject to
bias and develop robust new alternatives suitable for use with the continental-scale genomic datasets that are
now routinely available for humans, we propose to use simulations and machine learning to separate the
signals of fine-scale ancestry from polygenic phenotype association. In our first aim we will develop simulations
of polygenic phenotype evolution in continuous space and use the output to evaluate existing methods of
stratification control including linear mixed models, PC correction, and LD score regression. In this aim we will
seek to identify the regions of parameter space – i.e. the strength of isolation by distance and the spatial
distribution of environmental variation – in which existing methods can be expected to produce reliable effect
size estimates, and establish guidelines for applications of GWAS to structured populations.
 We will then train machine learning algorithms on real genotype data from humans and mosquitoes to
describe continuous structure in large spatial samples using a variational autoencoder, a dimensionality
reduction technique based on deep neural networks that can take advantage of both allele frequency and
haplotype-based measures of differentiation in a single analysis and thus offer improved control of stratification
inflation in GWAS relative to the now standard PCA regression approach. Last we will apply deep learning
techniques to the problem of linking phenotypes and genotypes in structured samples by training neural
networks on simulated phenotypes and empirical genetic data. By training our networks on empirical genetic
data and incorporating contextual information about surrounding haplotype structure into the model, our
networks should learn to discriminate causal associations from false positives created by population structure
in the sample cohort, which will improve performance when attempting to identify associations with the real
phenotype. These methods will be ap...

## Key facts

- **NIH application ID:** 9910009
- **Project number:** 1F32GM136123-01
- **Recipient organization:** UNIVERSITY OF OREGON
- **Principal Investigator:** Christopher J Battey
- **Activity code:** F32 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $19,290
- **Award type:** 1
- **Project period:** 2020-05-01 → 2020-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9910009

## Citation

> US National Institutes of Health, RePORTER application 9910009, Computational Methods for Next-Generation GWAS (1F32GM136123-01). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/9910009. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*