# Identifying genetic code reassignments in nucleotide sequence databases

> **NIH NIH F31** · HARVARD UNIVERSITY · 2021 · $32,666

## Abstract

Project Summary Abstract
 Biological discoveries made in other organisms tell us about the functions of human
genes because of the ability to compare homologous protein sequences. Recent efforts to
sequence a greater diversity of species for comparative analysis have been primarily done on
the DNA level, and protein sequences are subsequently translated in silico assuming some
genetic code. However, there is currently no informed way of selecting the correct genetic code
for a newly sequenced organism, which is critical for the correct translation of predicted protein
sequences. As more diverse organisms are sequenced, species using variant genetic codes
continue to be found, suggesting that there may be a hidden diversity of alternative genetic
codes across the tree of life.
 Aim 1 proposes building a computational tool to predict the genetic code used by an
organism from nucleotide sequence alone. This would fill in a critical missing step in genome
annotation pipelines and would ensure the accuracy of protein sequence databases, which are
predominantly composed of predicted protein sequences. In aim 2, the computational tool will
be used to infer the genetic code usage of all publicly available genomes and validate any new
genetic codes by computational analysis of tRNA genes, experimental confirmation of tRNA
expression via Northern blotting, and confirmation of altered codon translation via proteomic
mass spectrometry. In aim 3, the updated distribution of alternative genetic codes will be used
to address long-standing hypotheses in the field about how the genetic code is thought to
evolve.
 This research training plan is intended to prepare the PI for a career as an independent
and interdisciplinary researcher. The training environment will be in a collaborative
computational laboratory, with access to a lab bench and shared lab equipment to do the
proposed experiments. The training plan will also include development of science
communication skills, including oral presentations and writing.

## Key facts

- **NIH application ID:** 10075793
- **Project number:** 5F31HG010984-02
- **Recipient organization:** HARVARD UNIVERSITY
- **Principal Investigator:** Yekaterina Shulgina
- **Activity code:** F31 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $32,666
- **Award type:** 5
- **Project period:** 2019-12-16 → 2021-12-15

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10075793

## Citation

> US National Institutes of Health, RePORTER application 10075793, Identifying genetic code reassignments in nucleotide sequence databases (5F31HG010984-02). Retrieved via AI Analytics 2026-06-14 from https://api.ai-analytics.org/grant/nih/10075793. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*