# Computational methods for variant calling and haplotyping using long-read sequencing technologies

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2022 · $381,631

## Abstract

Project Summary/Abstract
In this project, we propose to develop computational methods and tools for whole-genome haplotyping and
small variant calling using long-read sequencing technologies such as Pacific Biosciences and Oxford
Nanopore and linked-read technologies. Haplotype information is crucial for interpretation of genetic variation
in individual genomes, disease mapping, clinical genomics and several other analysis of human genetic
variation. The lack of phase or haplotype information in human genomes sequenced using short reads is a
major barrier in identifying disease associations with compound heterozygous mutations. More than 600 genes
overlap segmental duplications with high sequence identity and variants in more than 100 such genes have
been associated with rare Mendelian disorders and complex diseases including cancer. The inability to detect
variants with high accuracy in duplicated regions of the genome using short-read sequencing technologies
reduces the ability to identify disease causing mutations in medical genetics studies. In Aim 1, we will develop
a general computational method for long-read based diploid genotyping that will enable accurate haplotyping
for single nucleotide variants and short indels using long-read and linked-reads as well as accurate small
variant calling using SMS technologies. In Aim 2, we will develop computational methods for sensitive mapping
of SMS reads and accurate variant calling in repetitive regions of the human genome that are currently
excluded from benchmark small variant call sets for reference human genomes. Finally, in Aim 3, we will
leverage the methods from Aims 1 and 2 to perform variant calling on multiple genomes sequenced using SMS
technologies to catalog variant PSVs and leverage this catalog to improve read mapping and variant calling
accuracy of short-read sequencing in repetitive regions of the genome. We will implement the methods in
robust and computationally efficient software tools and benchmark their accuracy using publicly available long-
read sequence datasets for multiple human genomes of diverse ancestries.

## Key facts

- **NIH application ID:** 10441522
- **Project number:** 5R01HG010759-03
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Vikas Bansal
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $381,631
- **Award type:** 5
- **Project period:** 2020-09-01 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10441522

## Citation

> US National Institutes of Health, RePORTER application 10441522, Computational methods for variant calling and haplotyping using long-read sequencing technologies (5R01HG010759-03). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10441522. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*