# Multi-view self-supervised deep learning for biological sequences and beyond

> **NIH NIH R35** · UNIVERSITY OF MISSOURI-COLUMBIA · 2024 · $391,250

## Abstract

Project Abstract
The breadth and depth of deep learning (DL) in solving fundamental biological problems have been
demonstrated. DL-based approaches, such as AlphaFold2 for 3D protein structure prediction, have become
widely accepted by the biology community. The Xu lab has been at the forefront of developing novel DL
algorithms, software, and information systems for diverse biological and medical problems. During the current
project period, the Xu lab has made excellent progress in addressing some of the urgent challenges and needs
for developing DL methods in biological sequence analyses and predictions, as well as other bioinformatics
problems. This R35 project has produced 31 papers covering research topics ranging from protein sequence-
based predictions to drug design, molecular dynamics simulation, and single-cell data analysis. In addition, it
also provided more than ten open-source tools and three major web-based resources to the community.
 The rapid development of new DL techniques and Xu lab’s accumulating expertise in this field bring new
opportunities in shaping DL to molecular biology. The current widely used supervised DL methods in biomedical
research often do not have sufficient data with clean and accurate labels for training and may not have good
generalizability. The emerging self-supervised learning (SSL) approaches that aim to learn informative
representations by exposing relationships between different data perspectives without human annotations are
becoming a new trend. Different data perspectives are broadly called multiview. The multi-view SSL techniques
allow us to generate joint or coordinated representations for single modal and multimodal data with stronger
generalizability, better robustness, and less bias. Though SSL has demonstrated great successes in other fields,
it has only been minimally explored in biology.
 This renewal project will develop a multi-view SSL framework that can handle both single-view and multi-
view data and is capable of single and multiple tasks. It will tackle key challenges and bottlenecks in applying
SSL for biological studies, such as selecting effective views and data augmentations, fusing multimodal data or
data from heterogeneous sources, and integrating biological constraints into SSL models. We will focus on
designing a biology-informed system, enhancing generalizability and robustness, and making the results
biologically interpretable and confidence assessable. The Xu lab will apply and refine the framework to multiple
mainstream biology applications, including anti-CRISPR protein prediction, by exploring various data
augmentation methods for protein sequences, ion and small ligand binding prediction using complementary
views of protein sequences and structures, and single-cell data analyses across different conditions. The
framework will also be tested for broad applications in sequence-based studies and beyond, such as alignment-
free methods for constructing phylogenetic trees ...

## Key facts

- **NIH application ID:** 10895278
- **Project number:** 5R35GM126985-07
- **Recipient organization:** UNIVERSITY OF MISSOURI-COLUMBIA
- **Principal Investigator:** DONG XU
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $391,250
- **Award type:** 5
- **Project period:** 2018-05-01 → 2028-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10895278

## Citation

> US National Institutes of Health, RePORTER application 10895278, Multi-view self-supervised deep learning for biological sequences and beyond (5R35GM126985-07). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10895278. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
