# Advancing Protein Engineering Using Artificial Intelligence and the ProtaBank Mutation Database

> **NIH NIH R44** · PROTABIT, LLC · 2020 · $495,160

## Abstract

PROJECT SUMMARY
Therapeutic antibodies, specialized enzymes for drug manufacturing, small molecule drug screening agents,
and other proteins have been instrumental in advancing biotechnology and medicine. Protein therapeutics
alone represents a rapidly growing $100+ billion market with broad applications in the treatment of cancer,
inflammatory and metabolic diseases, and numerous other disorders. Most of the antibodies and other protein
therapeutics developed in the last several years have been engineered, leading to improvements in important
properties such as efficacy, binding affinity, expression, stability, and immunogenicity. However, improving
protein properties through sequence modification remains a challenging task. Artificial intelligence (AI), which
has been enormously successful in several fields (e.g., image recognition, self-driving cars, natural language
processing), is now being applied to protein engineering and has the potential to transform this field as well. AI
and machine learning (ML) can take advantage of large and diverse datasets to identify correlations, predict
beneficial mutations, and explore novel protein sequences in ways that are not possible using other
techniques. Other advantages include the ability to simultaneously optimize multiple protein properties and
explore sequence space more efficiently. In Phases I and II of this project, we developed the ProtaBank
database as a central repository to store, organize, and annotate protein mutation data spanning a broad
range of properties. ProtaBank is the largest and only database actively collecting such a comprehensive set of
sequence mutation data and is growing rapidly due to the wealth of data being generated with advanced
automation and next-generation sequencing techniques. ProtaBank's depth and breadth makes it an ideal data
source to train ML models. This proposal aims to create the ProtaBank AI Platform to enable the use of AI and
ML tools to apply the data in ProtaBank to engineer proteins. The platform will provide fully customizable
computational tools and will invoke protein-specific knowledge to properly prepare data for use with ML
models. An interface to popular ML frameworks will be provided so that scientists can use these techniques to
discover new predictive algorithms and enhance their ability to design proteins with the desired properties.
Specific aims include: (1) integrating peer validated ML methods and proprietary technology for protein
engineering into the ProtaBank AI Platform, (2) developing dynamic ML dataset creation tools, (3) expanding
and improving the ProtaBank database by reaching out to scientists to contribute data, (4) enhancing our data
deposition tools, and (5) integrating ProtaBank with the Protein Data Bank structure database and other
databases.
!

## Key facts

- **NIH application ID:** 9994932
- **Project number:** 5R44GM117961-05
- **Recipient organization:** PROTABIT, LLC
- **Principal Investigator:** Barry D Olafson
- **Activity code:** R44 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $495,160
- **Award type:** 5
- **Project period:** 2016-06-01 → 2022-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9994932

## Citation

> US National Institutes of Health, RePORTER application 9994932, Advancing Protein Engineering Using Artificial Intelligence and the ProtaBank Mutation Database (5R44GM117961-05). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9994932. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
