PROJECT SUMMARY The exploration and interpretation of large, complex datasets is vital to discovery in genomics. However, researchers now confront a fundamental limitation; unprecedented experiments are possible thanks to modern DNA sequencing technologies, yet existing “genome arithmetic” algorithms and data formats for comparing and dissecting the resulting datasets are incapable of keeping pace with inexorable growth in dataset size and complexity. Genome arithmetic (GA) represents a powerful and widely used set of techniques that allow one to explore relationships among sets of genome features (e.g., a gene, sequence alignment, ChIP-seq peak, or anything that can be described with chromosome coordinates). GA is used for a broad spectrum of analyses including: the detection of intersecting/overlapping features (e.g., sequence alignments and exons), describing feature coverage among datasets, and the merging, subtraction, and complementation of feature datasets. GA functionality is used by all genome browsers and data visualization tools, and by analysis software such as GATK and SAMTOOLS. Our BEDTOOLS software has become a staple of genomics research and is used in a broad range of genomic analyses. However, continuous support and development have also revealed key limitations with its current functionality and crucial limitations that hinder analytical flexibility. We argue that innovations in genome arithmetic algorithms, data formats and user-friendly software are needed to: (1) empower researchers to conduct large-scale analyses with simple, flexible tools; (2) improve analysis tools to keep pace with the scale of modern datasets; (3) visualize and quantify relationships among genome datasets. Therefore, the overall objective of this proposal is to provide the genomics community with innovative new algorithms and software that keep pace with modern genomics experiments and facilitate future discoveries. The Specific Aims are to: (1) Develop a refined suite of genome arithmetic algorithms and programming interface for scalable analysis with BEDTOOLS. (2) Create new algorithms and genome interval sketching approaches to enable large-scale dataset comparisons. (3) Enable large-scale visualization and statistical analyses grounded in our recent advances in devising scalable new data formats. These innovations will yield with scalable new algorithms, data structures and formats that will empower thousands of genomics researchers around the world.