Project Summary/Abstract Molecular evolution and genomics research has entered an exciting phase with the advent of sequencing techniques, enabling us to profile genome variation from hundreds of cells from an individual. Now, evolutionary patterns and processes can be revealed at the highest cellular resolution. However, the state-of-the-art phylogeny reconstruction methods perform poorly for cellular sequencing data because the number of genetic variants is small due to a low mutation rate and short time span. Cellular sequence alignments are frequently tall, i.e., a small number of variants (columns) and a large number of sequences (cells, rows). A common feature of these tall datasets is the presence of sequencing error due to technical challenges associated with single-cell sequencing. Even small sequencing errors cause inferred cellular phylogenies to become unreliable and produce erroneous downstream biological inferences. We will develop innovative methods for molecular evolutionary and phylogenetic analysis of tall data for studying somatic and pathogen evolution. Specifically, our aims will be to (a) develop a mutation ordering and phylogeny estimation (MOPE) framework to infer tall data phylogenies accurately and (b) integrate MOPE with traditional phylogenetic methods to further increase the accuracy of evolutionary inferences. We will also (c) develop a library of software for high-throughput analysis of tall data. Ultimately, the proposed software and research developments will advance molecular evolution and genomics, bioinformatics, and biomedicine. New software and its source code will be made available free for research, education, and training.