# Interpretable and extendable deep learning model for biological sequence analysis and prediction

> **NIH NIH R35** · UNIVERSITY OF MISSOURI-COLUMBIA · 2020 · $378,183

## Abstract

Project Abstract
Bioinformatics and computational biology have become the core of biomedical research. The PI Dr. Dong Xu's
work in this area focuses on development of novel computational algorithms, software and information
systems, as well as on broad applications of these tools and other informatics resources for diverse biological
and medical problems. He works on many research problems in protein structure prediction, post-translational
modification prediction, high-throughput biological data analyses, in silico studies of plants, microbes and
cancers, biological information systems, and mobile App development for healthcare. He has published more
than 300 papers, with about 12,000 citations and H-index of 55. In this project, the PI proposes to develop
deep-learning algorithms, tools, web resources for analyses and predictions of biological sequences, including
DNA, RNA, and protein sequences. The availability of these data provides emerging opportunities for precision
medicine and other areas, while deep learning as a cutting-edge technology in machine learning, presents a
new powerful method for analyses and predictions of biological sequences. With rapidly accumulating
sequence data and fast development of deep-learning methods, there is an urgent need to systematically
investigate how to best apply deep learning in sequence analyses and predictions. For this purpose, the PI will
develop cutting-edge deep-learning methods with the following goals for the next five years:
 (1) Develop a series of novel deep-learning methods and models to specifically target biological
sequence analyses and predictions in: (a) general unsupervised representations of DNA/RNA, protein and
SNP/mutation sequences that capture both local and global features for various applications; (b) methods to
make deep-learning models interpretable for understanding biological mechanisms and generating
hypotheses; (c) “rule learning”, which abstracts the underlying “rules” by combining unsupervised learning of
large unlabeled data and supervised learning of small labeled data so that it can classify new unlabeled data.
 (2) Apply the proposed deep-learning model to DNA/RNA sequence annotation, genotype-phenotype
analyses, cancer mutation analyses, protein function/structure prediction, protein localization prediction, and
protein post-translational modification prediction. The PI will exploit particular properties associated with each
of these problems to improve the deep-learning models. He will develop a set of related prediction and analysis
tools, which will improve the state-of-art performance and shed some light on related biological mechanisms.
 (3) Make the data, models, and tools freely accessible to the research community. The system will be
designed modular and open-source, available through GitHub. They will be available like integrated circuit
modules, which are universal and ready to plug in for different applications. The PI will develop a web resource
for b...

## Key facts

- **NIH application ID:** 9925232
- **Project number:** 5R35GM126985-03
- **Recipient organization:** UNIVERSITY OF MISSOURI-COLUMBIA
- **Principal Investigator:** DONG XU
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $378,183
- **Award type:** 5
- **Project period:** 2018-05-01 → 2023-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9925232

## Citation

> US National Institutes of Health, RePORTER application 9925232, Interpretable and extendable deep learning model for biological sequence analysis and prediction (5R35GM126985-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9925232. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*