PROJECT SUMMARY/ABSTRACT: Immune receptors of the adaptive immune system (antibody/B-cell or T-cell receptors, or AIRR data) are designed to recognize and remove pathogens, and also recognize and preserve self-molecules. Therefore, these receptors have to be highly variable; it has been estimated that the number of possible human B-cell receptors approaches 1013. In addition to the diversity of these receptor sequences, the genes that underlie the production of these receptor molecules are highly diverse and complicated, and the data describing how these receptors bind to antigens (such as influenza) are also highly complex. Repositories to curate, analyze, and share these data are necessary to characterize B/T cell function in disease, as well as facilitate the discovery of new vaccine leads and therapeutic monoclonal antibodies to suppress autoimmune disease and cancer. Such data repositories are available, but they tend to focus on only one aspect of the data. Given that these repositories typically have been developed independently, the primary data and associated metadata (age, demography, sex, etc.) of the samples are stored in non-compatible forms, and in addition, the enormous size and complexity of the data make data sharing and integrated analyses extremely challenging. The goal of the proposed research is to establish the integrated AIRR Knowledge Commons (i-AKC), a novel knowledgebase that will allow seamless access, exploration, analysis, querying, and downloading of these various data types from a single point of entry. Our approach will be based on the very successful AIRR Community initiative, a group of immunologists, bioinformaticians, and experts in ethics and data sharing who have worked together since 2015 to develop protocols and standards for analysis and data sharing tools. One of the outcomes of the AIRR Community is the AIRR Data Commons, a set of data repositories that store the immense immune receptor repertoires that underlie the adaptive immune response. The proposed research takes the next important step of integrating (1) the AIRR Data Commons with repositories of (2) antigen/receptor binding and (3) germline immune genes. Steps to producing the i-AKC are (1) develop a common data model and establish common data elements relying on existing ontologies and community standards, (2) integrate the data using innovative algorithms and automation tools and enrich it with new knowledge derived from algorithms operating on the integrated data, and (3) community building. Using the i-AKC, researchers will, for example, be able to discover receptor sequences based on metadata or sequence searches, seamlessly examine information on the germline genes underlying these receptor sequences or examine what is known about the binding targets of these receptors. This novel and innovative knowledgebase will facilitate data and knowledge exploration that would be prohibitively difficult using sets of “siloed” repositories and will gre...