Learning a molecular shape space for the adaptive immune system

NIH RePORTER · NIH · R35 · $367,139 · view on reporter.nih.gov ↗

Abstract

Project Summary The adaptive immune system consists of highly diverse B- and T-cell receptors, which can recognize and neutralize a multitude of diverse pathogens. Immune recognition relies on molecular interactions between immune receptors and pathogens, which in turn is determined by the complementarity of their 3D structures and amino acid compositions, i.e., their shapes. Immune shape space has been previously introduced as an abstraction for such molecular recognition to explain how immune repertoires are organized to counter diverse pathogens. However, the relationships between immune receptor sequence, shape, and specificity are very difficult to quantify in practice. We propose to use recent advances in machine learning and the wealth of molecular data to infer an effective shape space, grounded in biophysics of protein interactions. The key is to find a representation of proteins in general, and of immune receptors, in particular, that reflects the relevant biophysical properties that determine a protein receptor’s stability, function, and interaction with pathogens. Representation learning is a powerful technique in machine learning that uses large amounts of data to infer a reduced representation. Since protein function is closely related to the 3D structure, we will develop novel machine learning methods that use atomic coordinates of a protein structure as input and, through transformations that respect the physical symmetries in the data, learn representations that reflect biophysical properties of proteins and protein-protein interactions. We believe a key innovation in our approach is the analysis of amino acid neighborhoods within 3D protein structures. The distribution of these neighborhoods will reveal how they differ at the surface, in the bulk, and at functionally important regions such as catalytic sites. The learned protein representation will enable us to characterize how specific compositions of amino acid neighborhoods are the building blocks of protein structure and protein function. We will transfer the representation of protein universe to immune receptors to learn the immune shape space. The leaned immune shape space will enable us to address how affinity and specificity are encoded by immune receptors in different cell types. We will study how the modular structure of immune receptors, with separate pathogen engaging and framework regions, enables receptors to diversify and target a multitude of pathogens, without compromising their stability. We will use the complementary aspect of shape recognition to predict the antigenic targets of the immune receptors, and through collaborations, we will experimentally validate our predictions. Our approach opens a new path towards interpretable computational models of proteins and immune receptors that describe how biological properties and biological function emerge from protein subunits. Additionally, the inferred molecular representations can be used as a generative mode...

Key facts

NIH application ID: 10865002
Project number: 5R35GM142795-04
Recipient: UNIVERSITY OF WASHINGTON
Principal Investigator: Armita Nourmohammad
Activity code: R35
Funding institute: NIH
Fiscal year: 2024
Award amount: $367,139
Award type: 5
Project period: 2021-08-15 → 2026-06-30