Microbiomes, the diverse communities of microorganisms found in environments like soil, water, and the human body, are fundamental to many natural processes, including carbon cycling, nutrient cycling, and ecosystem health. To better understand microbiomes, the scientific community has heavily invested in sequencing and multi-omics technologies, generating vast amounts of microbiome-derived data that offer valuable insights into microbial diversity and function. Despite these advances, studying soil microbiomes remains particularly challenging because soils host the most diverse microbial communities on Earth, leading to sparse and fragmented coverage. This coverage challenge is further exacerbated by small sample sizes and an inconsistent approach to collecting the different data types, making it difficult to develop comprehensive models that generalize across environments. This project aims to develop an Artificial Intelligence foundation model for soil microbiomes that leverages all existing public data sets in order to provide a more comprehensive framework representing soil microbiomes. This project aims to address the challenges of soil metagenomic data sparsity by developing the Multi-Modality Microbiome Foundation Model (M3FM), an artificial intelligence model that integrates microbiome data across various studies and of several data types. M3FM will use a self-supervised learning approach to leverage the large number of public data sets from diverse sources witho