# Biology-aware machine learning methods for characterizing microbiome genotype and phenotype

> **NIH NIH R35** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2023 · $14,817

## Abstract

PROJECT SUMMARY
The Mirarab laboratory designs computational methods for answering biological and biomedical questions, fo-
cusing on scalability and accuracy. These methods span several areas (e.g., microbiome proﬁling, multiple
sequence alignment, and phylogenomics), and a common thread among them is evolutionary modeling. More
recently, many of the developed methods are based on machine learning. The lab has developed scalable and
accurate methods for reconstructing evolutionary histories (i.e., phylogenies) and using these histories in down-
stream biomedical applications. Methods developed by this lab (e.g., ASTRAL, SEPP, DEPP) are at the fore-
fronts of modern genome-wide phylogenetics. While the lab has previously focused more on inferring species
histories, through an MIRA grant, it has shifted its focus to developing methods for microbiome analyses, which
pose their a unique set of challenges.
 As part of the MIRA application, the Mirarab lab will focus on designing, testing, and applying improved
methods for statistical analyses of microbiome data. These methods will target two questions. (i) Proﬁling:
What organisms constitute a given sample? (ii) Association: How are samples different in their organismal
composition, and how do these differences connect to measurable characteristics of their environment? While
both questions have been subject to considerable research, many computational challenges remain, providing
an opportunity for better methods to make a signiﬁcant impact. Instead of focusing solely on new algorithms,
the lab will also work on building better reference datasets and combining data from multiple sources. Thus, the
project aims to harness the unprecedented computational power, large available datasets, and recent advances
in machine learning to improve state-of-the-art dramatically. The project will not use off-the-shelf machine
learning methods in a black-box fashion. Instead, it develops methods that incorporate biological knowledge
(e.g., of the evolutionary relationships) into machine learning methods in a principled biologically-motivated
fashion.
 Within the context of the MIRA award, this supplementary request is to request support for an undergradu-
ate student who is considering pursuing biomedical research career by providing research experiences in the
intersection of mathematics/algorithmics and biology.

## Key facts

- **NIH application ID:** 10810437
- **Project number:** 3R35GM142725-02S1
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Siavash Mir arabbaygi
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $14,817
- **Award type:** 3
- **Project period:** 2021-09-15 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10810437

## Citation

> US National Institutes of Health, RePORTER application 10810437, Biology-aware machine learning methods for characterizing microbiome genotype and phenotype (3R35GM142725-02S1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10810437. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
