# SHAPEIT+Salmon: haplotype phasing and RNA-seq quantification for allele-specific eQTL mapping

> **NIH NIH R21** · CARNEGIE-MELLON UNIVERSITY · 2020 · $210,643

## Abstract

PROJECT SUMMARY / ABSTRACT
Allele-speciﬁc expression quantitative trait locus (eQTL) mapping has become increasingly popular, since it en-
hances the traditional eQTL mapping by providing signiﬁcantly more detailed gene regulatory mechanisms un-
derlying the genetic architecture of diseases. Allele-speciﬁc eQTL mapping identiﬁes cis-acting and trans-acting
eQTLs that each pinpoint to cis-regulatory elements and trans-acting factors, by leveraging the fact that unlike
trans-acting eQTLs, cis-acting eQTLs affect the expression of transcripts from the same haplotype as the variant
itself, causing allelic imbalance in expression. However, allele-speciﬁc eQTL mapping requires a reliable long-
range phasing of genome sequences and an accurate allele-speciﬁc expression quantiﬁcation from RNA-seq data
consistent with the genome phasing. Most existing works have treated allele-speciﬁc expression quantiﬁcation
and phasing as independent tasks, even though each can enhance the accuracy of the other. In this proposed
research, we will modify and pair up the two widely-used tools, SHAPEIT for genome phasing and Salmon for
RNA-seq quantiﬁcation, to obtain an accurate phasing and allele-speciﬁc expression quantiﬁcation consistent
with each other for allele-speciﬁc eQTL mapping. The combined tool will inherit or enhance the accuracy and
efﬁciency of the two original methods. If phased sequences are known from experimental or trio data, we will
replace the EM algorithm of Salmon with an accelerated EM to address the extreme multi-mapped read problem
with computational efﬁciency. If phased sequences are not available as in unrelated individuals, we will modify
SHAPEIT to jointly phase the variants and allele-speciﬁc read abundances, embedding allele-speciﬁc expression
quantiﬁcation within SHAPEIT and using Salmon for obtaining transcript quantiﬁcation and allele-speciﬁc read
abundances. As a testbed, we will use genotype and RNA-seq data from a 50 generation intercross, cross be-
tween two inbred mouse strains. Because these data are derived from two fully sequenced inbred founders, the
correct phase is known. Though we use mice as a testbed, our approach is applicable to data from any diseases,
tissues, and organisms, including GTEx data.

## Key facts

- **NIH application ID:** 9958822
- **Project number:** 1R21HG011116-01
- **Recipient organization:** CARNEGIE-MELLON UNIVERSITY
- **Principal Investigator:** Seyoung Kim
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $210,643
- **Award type:** 1
- **Project period:** 2020-05-01 → 2022-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9958822

## Citation

> US National Institutes of Health, RePORTER application 9958822, SHAPEIT+Salmon: haplotype phasing and RNA-seq quantification for allele-specific eQTL mapping (1R21HG011116-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/9958822. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
