Accelerating genomic analysis for time critical clinical applications

NIH RePORTER · NIH · R21 · $170,988 · view on reporter.nih.gov ↗

Abstract

SUMMARY / ABSTRACT Genome-scale DNA sequencing has revolutionized the practice of precision medicine, at dramatically reduced cost. It is possible today to sequence an entire human genome in roughly one day; however, bioinformatic analysis typically takes days or weeks, and has emerged as the major bottleneck for successfully utilizing genome sequencing in time-critical applications, e.g. for identifying the genomic vulnerabilities of a patient’s tumor for rational cancer treatment selection within a clinically relevant timeframe. The overarching goal of this proposal is to dramatically speed up genomic analysis algorithms via heterogeneous computing techniques. Here we will focus on one critical aspect of genomic analysis, i.e. variant calling, and set the ambitious goal of completing the analysis of a 60X-coverage Illumina whole genome sequencing dataset in under 10 minutes, far faster than the current state of the art. Although here applied to only one analysis task, accomplishing such a high degree of acceleration would demonstrate that the techniques we are developing in this proposal are also generalizable across many other genomic analysis tasks. Our approach is to first accelerate the most widely reusable software components, to maximize value for the genomic analysis tool developer community, who will then be able to integrate these components into their own tools. With these reusable software components, we will accelerate the FreeBayes variant caller tool. FreeBayes is a widely used germline variant and somatic mutation detection tool, and therefore acceleration will benefit a large user audience. This software was developed in our own laboratory, and therefore we are intimately familiar with its algorithms and code base, positioning us for success in this exploratory project. If successful, our technique will be applicable for accelerating many, currently time-consuming analysis tasks. As a result, analysts will be able to finish sophisticated data processing tasks within minutes, as part of their interactive analysis session rather than a batched background process, and complete manual result review immediately after; rendering the complete analysis process sufficiently fast for time-critical clinical applications.

Key facts

NIH application ID
10763012
Project number
5R21CA271098-02
Recipient
UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
Principal Investigator
Gabor T Marth
Activity code
R21
Funding institute
NIH
Fiscal year
2024
Award amount
$170,988
Award type
5
Project period
2023-01-10 → 2024-12-31