Collaborative Research: A mathematical framework for scalable species tree estimation via site pattern scoring schemes

NSF Award Search · 01002627DB NSF RESEARCH & RELATED ACTIVIT · $299,846 · view on nsf.gov ↗

Abstract

High-quality genome data are being produced at a pace that is changing evolutionary biology. These data can clarify how species are related, but they also bring a hard problem: evolution is not uniform across the genome. This project develops new mathematics and algorithms for estimating species histories from whole-genome data while allowing a separate history for each region. The approach builds on a recently proposed, fast, and accurate method that uses simple scores computed from patterns in the data but is not yet well understood theoretically. By explaining why its scores work, finding better scores, and extending the method to additional settings, the project will make genome-scale evolutionary analysis more accurate, scalable, and robust. The resulting tools will be distributed as open software and taught through software schools. The project will also train students at the interface of mathematics, computer science, and biology. Its long-term benefits include stronger tools for biological discovery, including work relevant to biotechnology, invasive species, and disease outbreaks, and new ideas for artificial intelligence and machine learning methods for analyzing large heterogeneous datasets. This project will develop a mathematical framework for quartet-based linear scores for species tree estimation from whole-genome alignments. The starting point is CASTER, a site-based method whose empirical accuracy and scalability come from scoring site patterns over quart

Key facts

NSF award ID: 2601546
Awardee: University of California-San Diego (CA)
SAM.gov UEI: UYTTZT6G9DT1
PI: Siavash Mir arabbaygi
Primary program: 01002627DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), Biotechnology
Estimated total: $299,846
Funds obligated: $299,846
Transaction type: Standard Grant
Period: 09/01/2026 → 08/31/2029