Machine learning approaches for improved accuracy and speed in sequence annotation: supplement for software enhancement

NIH RePORTER · NIH · R01 · $221,904 · view on reporter.nih.gov ↗

Abstract

Summary The goal of this parent grant for this supplement request is to develop Machine Learning approaches to improve both accuracy and speed of highly-sensitive sequence database search and alignment. We have developed three software tools associated with this effort of correctly annotating genomes: (i) ULTRA, which labels repetitive sequence, (ii) PolyA which integrates such labels with other sequence annotations in a probabilistic framework, computing uncertainty and improving accuracy, and (iii) SODA, which aids in visualization of annotations and supporting evidence. Here, we describe a plan to refactor these software tools and their documentation to improve robustness and reliability, and to improve their availability through package management systems and incorporation into cloud-based analysis frameworks.

Key facts

NIH application ID
10406630
Project number
3R01GM132600-03S1
Recipient
UNIVERSITY OF MONTANA
Principal Investigator
Travis John Wheeler
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$221,904
Award type
3
Project period
2019-09-20 → 2023-07-31