# Addressing Open Challenges of Computational Genome Annotation

> **NIH NIH R01** · GEORGIA INSTITUTE OF TECHNOLOGY · 2020 · $342,390

## Abstract

We propose to capitalize on success of ongoing collaboration between the bioinformatics
teams at the University of Greifswald (Germany) and at the Georgia Institute of Technology (USA)
and address open challenges in computational genome annotation. In the course of this
development, we plan to implement new algorithmic ideas and satisfy the needs of unbiased
integration of different types of OMICS data.
 We plan to address one of the long-standing problems at interface of bioinformatics and
machine learning – automatic generative and discriminative parameterization of gene finding
algorithms. Current methods of combining OMICS evidence frequently result in under predicting
or over predicting tools. Having good understanding of the difficulties and the properties of
different types of OMICS evidence we propose an optimized approach to the full unsupervised,
generative and discriminative training.
 We will introduce novel means to optimize integration of multiple OMICS evidence into gene
prediction. These ideas will develop further the protein family-based gene finding implemented
in AUGUSTUS-PPX. We propose to create representations of protein families for gene finding
that for the first time include cross-species gene structure information.
 We will develop a new approach that will unify two advanced research areas - transcript
reconstruction from RNA-Seq and statistical gene finding that integrates RNA-Seq and homology
information. We will describe a new, comprehensive model and EM-like algorithmic technique
(the “wholistic” approach) to identify the sets of transcripts and their expression levels that best fit
the available OMICS evidence.
 We will also develop an automatic gene-finding algorithm for a full content of metagenomes
including eukaryotic and viral metagenomic sequences. This task is conventionally considered
too challenging. We propose a solution exploiting and advancing algorithmic ideas and
approaches that we mastered in the course of creating gene finders for prokaryotic metagenomes
as well as eukaryotic genomes.
 All new tools will be available to the community under open source licenses.

## Key facts

- **NIH application ID:** 9975182
- **Project number:** 5R01GM128145-03
- **Recipient organization:** GEORGIA INSTITUTE OF TECHNOLOGY
- **Principal Investigator:** MARK BORODOVSKY
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $342,390
- **Award type:** 5
- **Project period:** 2018-09-01 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9975182

## Citation

> US National Institutes of Health, RePORTER application 9975182, Addressing Open Challenges of Computational Genome Annotation (5R01GM128145-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9975182. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
