# Personal and panel references for improved alignment

> **NIH NIH R01** · JOHNS HOPKINS UNIVERSITY · 2021 · $351,576

## Abstract

PROJECT SUMMARY
Next-generation sequencing is ubiquitous in the study of biology and disease. The ﬁrst step when analyz-
ing a sequencing dataset is read alignment: the process of determining where each snippet of sequencing
data (“read”) came from with respect to a reference genome. Currently, genomics research is hampered
by the use of a single, arbitrary reference. This fails to account for the vast genetic diversity that exists
among humans and model organisms. Further, it can result in “reference bias,” in turn leading to false or
misleading scientiﬁc results.
 We propose a three-aim project that addresses the reference bias problem on multiple fronts. In
Aim 1, we will develop new methods and a new software tool called biastools for summarizing and
visualizing reference bias. In Aim 2, we will develop new software and methods that address reference
bias by enabling alignment to multiple representative reference genomes. In one subproject, we will use
genotype imputation to infer a personalized genome with the help of a large panel of reference haplotypes.
In a second subproject, we will use small collections of representative genomes connected in a “ﬂow
graph,” so that reads are ultimately analyzed with respect to the most appropriate reference. The methods
described in both subprojects will be implemented as part of a new software tool called pals. Also as part
of this aim, we will release a software library and tool called jector for transforming alignments from one
reference coordinate system to another. Finally, for Aim 3, we apply a novel text-indexing method called
r-index to enable alignment of reads to large panels of reference haplotypes. We will release the software
as a software library and tool called pandex.
 Successful completion of the project will provide the community with new methods and references
that leverage the genetic information we are gleaning from large-scale genotyping studies and from new
long-read assemblies. All software will be made available under an open source license.

## Key facts

- **NIH application ID:** 10242948
- **Project number:** 5R01HG011392-02
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** Benjamin Thomas Langmead
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $351,576
- **Award type:** 5
- **Project period:** 2020-09-01 → 2025-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10242948

## Citation

> US National Institutes of Health, RePORTER application 10242948, Personal and panel references for improved alignment (5R01HG011392-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10242948. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
