# Deep learning methods for automated and accurate reconstruction of protein structures from cryo-EM image data

> **NIH NIH R01** · UNIVERSITY OF MISSOURI-COLUMBIA · 2022 · $305,657

## Abstract

Project Summary
Cryogenic electron microscopy (cryo-EM) has emerged as a major experimental technology to determine protein
structures as it reached atomic resolution (1.2-4Å) in recent years. Compared to traditional techniques (i.e., X-
ray crystallography and nuclear magnetic resonance), cryo-EM has the unique capability of determining the
quaternary structures of large protein complexes and assemblies difficult or impossible for them to handle. The
advance of cryo-EM technology has stimulated a revolution in structural biology of studying large protein
complexes and assemblies that cannot be well studied before. However, the computational reconstruction of
protein structures from cryo-EM image data is still a time-consuming, labor-intensive, error-prone, and often
inaccurate process, due to the bottleneck in picking protein particles in cryo-EM images, substantial noise in 3D
cryo-EM density maps generated from particle images, and lack of automated and accurate methods to build
protein structures from density maps. We plan to develop advanced deep learning methods to reconstruct protein
structures automatically and accurately from cryo-EM images data, leveraging the large amount of high-
resolution cryo-EM data accumulated in the field and the latest advances in the deep learning technology.
 We will develop 2D transformer networks built on top of the attention mechanism that perform better than
traditional convolutional and recurrent neural networks in image processing to pick single protein particles
accurately and automatically in cryo-EM image data via a novel combination of unsupervised and supervised
learning. Moreover, we formulate the problem of denoising 3D cryo-EM density maps generated from 2D particle
images as a novel machine learning problem and will develop both 3D deep autoencoders and rotation-
/translation-equivariant transformer networks to remove noise in cryo-EM density maps. Furthermore, we will
develop end-to-end 3D rotation-/translation-equivariant networks to directly identify the backbone atoms of
proteins from 3D density maps without using any known structure as template, which will be used by a novel
hidden Markov model to build the high-resolution full-atom structures of any protein. The methods will be
rigorously evaluated on the large amount of cryo-EM data and compared with existing methods. All these
methods will be integrated together to create a fully automated machine learning pipeline, the first of its kind in
the field, to reconstruct protein structures more accurately from cryo-EM images than existing methods. We will
implement the individual deep learning methods as well as the entire pipeline as open-source packages released
at GitHub for the community to use. We will further validate the tools and pipeline by applying them to the new
cryo-EM data of a group of important membrane protein complexes (i.e., ion channels) to be generated at the
Brookhaven National Laboratory.

## Key facts

- **NIH application ID:** 10459829
- **Project number:** 1R01GM146340-01
- **Recipient organization:** UNIVERSITY OF MISSOURI-COLUMBIA
- **Principal Investigator:** Jianlin Cheng
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $305,657
- **Award type:** 1
- **Project period:** 2022-09-20 → 2026-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10459829

## Citation

> US National Institutes of Health, RePORTER application 10459829, Deep learning methods for automated and accurate reconstruction of protein structures from cryo-EM image data (1R01GM146340-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10459829. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
