High-throughput experimental determination and computational prediction of variant effects in yeast

NIH RePORTER · NIH · R01 · $320,557 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract The broad objective of the proposed research is to achieve comprehensive understanding of the effects of DNA sequence variants on complex and quantitative traits in the yeast S. cerevisiae, arguably the most powerful eukaryotic model system due to its small genome, ease of genetic manipulation, and the ability to generate very large sample sizes. Evolutionary conservation has also ensured that many yeast traits have direct parallels to biomedically important human phenotypes. We seek to comprehensively identify the DNA loci and the candidate sequence variants within them that underlie genetic variation in fitness and expression traits, experimentally engineer and test the effects of variants on a massive scale, and build rules for predicting the functional effects of variants of unknown significance. Success in answering these questions will provide critical guidance for the design of genotype-phenotype studies in humans and other organisms of medical, biological, and agricultural interest, and enable improved diagnostic accuracy based on genome sequencing of patients. Specifically, we will leverage single-cell sequencing to massively scale up genetic mapping in order to increase statistical power and resolution of rare and common variant discovery. We will generate a mapping population of millions of genetically diverse yeast by using CRISPR/Cas9 and other strain engineering tools to facilitate crossing, selection of haploids, and incorporation of DNA barcodes. The mapping population will be genotyped and phenotyped for both genome-wide transcript abundance and multiple fitness traits, to provide a much richer sampling of the regulatory and other functional effects of natural yeast genetic variants, particularly rare genetic variants. We will then employ a CRISPR/Cas9-based strategy to engineer variants in parallel on a large scale and assess their effects in yeast through phenotypic assays. This approach involves designing libraries of edit-directing plasmids that incorporate a specific variant at the target location. The phenotypic assays rely on DNA barcoding and ultra-high-throughput sequencing. We will use statistical approaches to analyze the data and control for errors and false discoveries. We also plan to validate the efficiency of the system and improve the experimental design by evaluating the effects of various parameters. Finally, we will build improved predictive models of variant effects in yeast, using large data sets generated here in conjunction with existing yeast functional information. The focus will be on predicting the effects of coding and non-coding variants on fitness traits, with machine learning and artificial intelligence models that incorporate local sequence context, eQTL effects, and variant features such as allele frequency, functional scores, and evolutionary conservation. The performance of the predictive model will be evaluated on newly engineered variants, and the models will be impro...

Key facts

NIH application ID
10802965
Project number
2R01GM102308-10
Recipient
UNIVERSITY OF CALIFORNIA LOS ANGELES
Principal Investigator
LEONID KRUGLYAK
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$320,557
Award type
2
Project period
2012-09-01 → 2028-06-30