GPU-accelerated high-performance computing to supercharge foundational deep learning method development for scalable and accurate prediction of protein structures

NIH RePORTER · NIH · R35 · $240,509 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT: This supplement aims to acquire a Dell high-performance computing (HPC) server with 8 NVIDIA H100 Graphics Processing Units (GPUs) to supercharge foundational deep learning method development for scalable and accurate prediction of protein structures, paving the way to genomic-scale computational protein modeling regardless of evolutionary relationships with previously annotated proteins. Artificial intelligence- powered methods have led to a paradigm-shift in computational modeling of protein structures, yet even the most successful approaches for protein structure prediction fail to accurately predict structures of large multi- domain proteins with complex topologies or proteins with short sequences; and heavily depend on the availability of evolutionary information that are not always abundant such as with orphan proteins or rapidly evolving proteins. Work on structure prediction that uses single or few homologous sequences remains inaccurate and/or inefficient, limiting scaling to genomic protein databases. Latest advances in artificial intelligence such as foundational deep learning models hold the key to address the limitations. The parent R35 grant of this supplement aims to develop cutting-edge deep learning models to automate genomic-scale protein structure modeling with the key tasks of: (1) accurate de novo modeling of protein structures beyond evolutionary relatedness, even with single-sequence input; (2) high-fidelity identification of remotely homologous proteins despite low sequence similarly to previously annotated proteins; and (3) atomistic refinement of predicted protein structures to drive them towards experimental resolutions terms of stereochemical qualities and side-chain positioning. Our substantial progress in the first three years of the project has demonstrated the feasibility and promise of our approach. However, training and testing foundational deep learning models leveraging the transformer neural network architectures on evolutionary- scale molecular data require a large amount of GPU computing power. Using the current GPU resource available to us, it takes six months for a developer to complete the training and testing of one deep learning method end to end. While such a speed can yield steady progress, it is not fast enough to unleash the power of these advanced deep learning methods and realize the full potential and impact of the parent R35 project. This supplement will enable us to acquire a high-performance computing server consisting of 8 NVIDIA H100 80GB GPUs to significantly speed up the research in the parent R35 project. The requested GPUs can drastically reduce the time to complete the development a deep learning method from about six months to less than six weeks, thus dramatically improving the productivity of the developers and in turn accelerating publication and dissemination of the methods and tools developed in this project. The large shared GPU memory will enable us to ...

Key facts

NIH application ID: 11036862
Project number: 3R35GM138146-05S1
Recipient: VIRGINIA POLYTECHNIC INST AND ST UNIV
Principal Investigator: Debswapna Bhattacharya
Activity code: R35
Funding institute: NIH
Fiscal year: 2024
Award amount: $240,509
Award type: 3
Project period: 2020-09-15 → 2025-07-31