# Harmony AI: State of the Art Natural Language Processing for Genetic Engineering

> **NIH NIH R43** · CFD RESEARCH CORPORATION · 2023 · $343,491

## Abstract

Project Summary/Abstract:
Computational techniques for gene engineering, such as codon optimization, use synonymous
codon changes to increase protein production. Applications for these computational gene
optimizations include recombinant protein drugs, nucleic acid therapies, and mRNA vaccines.
Although codon optimization increases protein production in certain systems, synonymous
changes to a gene sequence can cause unexpected detrimental results to the protein. Further,
researchers have been critical of codon optimization for human therapeutics as the optimization
process can affect protein conformation and function, and reduce efficacy. Therefore, codon
optimization may not provide an optimal strategy for increasing protein production or designing
safe and effective therapeutics. CFDRC has utilized state-of-the-art natural language processing
techniques to learn how synonymous codons are used by a target organism and apply this learning
to gene engineering. We demonstrated our model could predict the E. Coli synonymous codon
usage with 73% accuracy, significantly above prior reports. We believe that using this AI-based
approach to gene engineering will provide an optimal strategy for increasing protein production
and may increase the efficacy of therapeutics.

## Key facts

- **NIH application ID:** 10698805
- **Project number:** 1R43GM150352-01
- **Recipient organization:** CFD RESEARCH CORPORATION
- **Principal Investigator:** David Gaddes
- **Activity code:** R43 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $343,491
- **Award type:** 1
- **Project period:** 2023-09-22 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10698805

## Citation

> US National Institutes of Health, RePORTER application 10698805, Harmony AI: State of the Art Natural Language Processing for Genetic Engineering (1R43GM150352-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10698805. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
