# The functional and phenotypic effects of protein coding genetic variation

> **NIH NIH R35** · HARVARD MEDICAL SCHOOL · 2024 · $423,750

## Abstract

PROJECT SUMMARY/ABSTRACT
My lab develops statistical methods to characterize genetic architecture and to translate genetic data into
biological insights. We are broadly interested in rare variant genetic architecture and its relationship with that of
common variants. A particular area of focus is on rare protein coding variation and its functional consequences.
 Among all known rare-variant associated genes, the majority are driven by protein truncating variants
(PTVs). PTVs are consistently deleterious, which is a benefit for association testing, but they make up <10% of
coding variants; other types of coding variants – missense, splice site, UTR variants – are functionally
heterogenous, with a spectrum of effects including loss, gain and change of function. Their functional
heterogeneity poses a challenge when integrating them into existing statistical methods, but it also presents an
opportunity biologically. This application focuses on the question: what is the relationship between the
functional effect of a mutation and its phenotypic effect on individuals?
 Proteins and protein coding variants are richly annotated. Variant effect prediction (VEP) has emerged
as one of the most successful applications of artificial intelligence in biology; for missense variants alone, at
least five sophisticated models (EVE, gMVP, ESM-1b, primateAI-3D, and alphaMissense) have been published
in 2021-2023, with clear implications for association testing. Other functional annotations as well – including
protein structure predictions, predicted effects on protein binding, and experimental measurements of variant
effects – complement VEP methods by capturing the functional heterogeneity of equally pathogenic alleles.
 We aim to understand the relationship between the functional and phenotypic effects of protein coding
variation by integrating genetic association data with a wide range of protein coding functional annotations. We
will identify and characterize functionally informed allelic series: multiple independently associated alleles in
a gene, whose heterogenous phenotypic effects align with their functional properties. To do so, we will use two
complementary approaches, focusing on individual genes and on the exome as a whole. By analyzing
individual genes with long, statistically unambiguous allelic series, we will identify functional annotations that
are relevant, and we will learn how genes differ from each other in the functional effects of trait-associated
variants within them. Then, we will analyze the functional architecture of protein-coding variation using the
functionally informed allelic series model, which integrates any number of structural or functional annotations
with genetic association data for one or more traits.
 In GWAS, the combination of genetic association data with regulatory annotations – using integrative
statistical methods – has been among the most productive sources of biological insight, both in genome-wide
scans and in the di...

## Key facts

- **NIH application ID:** 10939604
- **Project number:** 1R35GM155278-01
- **Recipient organization:** HARVARD MEDICAL SCHOOL
- **Principal Investigator:** Luke Jen O'Connor
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $423,750
- **Award type:** 1
- **Project period:** 2024-07-01 → 2029-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10939604

## Citation

> US National Institutes of Health, RePORTER application 10939604, The functional and phenotypic effects of protein coding genetic variation (1R35GM155278-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10939604. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
