Tools for comprehensive variant characterization using the pangenome

NIH RePORTER · NIH · U01 · $1,696,968 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY / ABSTRACT This grant proposal outlines a comprehensive plan to develop novel computational methods and software tools for analyzing pangenomic data, with a focus on improving the accuracy and efficiency of variant calling and genotyping, particularly for complex structural variants (SVs). The proposal is divided into five specific aims: Aim 1: Create a pangenome mapper supporting long-reads, which will enable accurate and efficient mapping of long-range sequencing data to pangenome references. Aim 2: Develop personalized pangenomes, which involves rapid and efficient construction of a subset of a larger graph based on an input sample's k-mers. This approach will tailor the pangenome for specific analysis and so lead to improved performance in downstream analysis. Aim 3: Create a pangenome variant calling and imputation method for unified genome inference, which will combine imputation with read-based genotyping using machine learning to infer a more complete representation of variation, including both small variants and SVs. Aim 4: Genotyping complex SVs involving protein-coding genes, which will involve identifying long segmental duplications, grouping haplotypes, and developing targeted genotyping methods for long and short reads. Aim 5: Develop mature rGFA based variant calling for reporting both SV and small variants within polymorphic sequence, which will expand the current definition of reportable variation and provide pipelines that can report tens of thousands of additional variations per sample. The proposal highlights the need for better computational tools for pangenome analysis, especially for complex SVs, and outlines a comprehensive plan to address these challenges. The proposed software tools and methods will enable researchers to analyze pangenomic data more effectively and efficiently, leading to new insights into genetic variation and its role in disease and other biological processes.

Key facts

NIH application ID: 10976565
Project number: 1U01HG013748-01
Recipient: UNIVERSITY OF CALIFORNIA SANTA CRUZ
Principal Investigator: Heng Li
Activity code: U01
Funding institute: NIH
Fiscal year: 2024
Award amount: $1,696,968
Award type: 1
Project period: 2024-09-23 → 2027-08-31