Developing the Apollo software for high-throughput annotation of multiple genomes

NIH RePORTER · NIH · R01 · $461,731 · view on reporter.nih.gov ↗

Abstract

Genome sequencing is becoming cheaper and more powerful. However, a bottleneck to scientific progress using these data is the generation of high-quality genome annotations, describing the location and function of the genes in genomic DNA. In the past, these annotations were generated exclusively by professional biocurators, and these biocurators continue to be a critical source of expertise. However, the number of such biocurators is small compared to the amount of genomic DNA that is available. In recent years, professional biocuration has been supplemented by a growing volunteer force of interested biologists, each generally interested in one or two genes of interest. Although genome annotation is often done in bulk using computational tools, there is currently no substitute for a final pass of manual annotation. This is most commonly done using our Apollo software, which allows for live simultaneous collaborative annotation over the web. Our proposal here is to improve Apollo so as to empower both professional biocurators and crowdsourcing volunteers. We will empower professional biocurators by giving them the power tools they need to simultaneously annotate multiple genomes (by exploiting synteny), annotate variants, and annotate the function of genes. We will empower the crowdsourcing volunteers by lowering barriers to entry, making Apollo more usable. For all users we will train machine learning systems to automatically detect common annotation errors and suggest improvements. We will support Apollo users with maintenance, bugfixing, and various feature requests, as well as extensive outreach including outreach to developers.

Key facts

NIH application ID
10736567
Project number
2R01GM080203-15
Recipient
UNIVERSITY OF CALIFORNIA BERKELEY
Principal Investigator
Ian H Holmes
Activity code
R01
Funding institute
NIH
Fiscal year
2023
Award amount
$461,731
Award type
2
Project period
2007-08-01 → 2027-08-31