PROJECT SUMMARY Multiple artificial intelligence (AI) technologies are now commercially available for automated interpretation of screening mammography. These AI technologies hold promise for improving screening performance and outcomes for the 40 million U.S. women who undergo routine breast cancer screening each year. Federal regulatory approval of new AI technologies requires only a demonstration of non-inferior accuracy to existing computer-aided detection systems in small, retrospective reader studies, but their widespread clinical translation is contingent upon more robust population-based evaluation. Specifically, the impact of these AI technologies on actual patient outcomes needs to be assessed, including whether or not they lead to improved detection of clinically meaningful cancers in the general screening population. Robust external validation of AI algorithms for mammography screening has thus far been limited by use of single institution datasets not representative of the entire target population, use of AI algorithms that are not publicly available, comparison to radiologist performance in enriched case sets, limited follow-up time for cancer diagnoses influencing ground truth labels, and evaluation on 2D digital mammography rather than 3D digital breast tomosynthesis (DBT) exams. Our study objective is to conduct a comparative evaluation of five commercially available AI technologies for automated DBT screening interpretation that overcomes all of these limitations and then estimate the long-term benefits, harms, and costs of AI-driven DBT screening at the U.S. population level. Specifically, we will 1) use a centralized honest broker, model-to-data paradigm infrastructure to perform an independent, external validation of five leading commercial AI technologies for DBT screening using prospectively collected data obtained from eight diverse U.S. regional breast imaging registries; 2) stratify AI vs. radiologist performance on detailed woman-, exam-, radiologist-, and tumor-level characteristics to inform targeted algorithm training and refinement efforts to ensure generalizability of the AI algorithms; 3) explore targeted approaches for improving clinical workflow efficiency by using AI to safely triage exams highly likely to be negative; and 4) use a validated breast cancer microsimulation model to determine population-level, long- term health benefits, harms, and costs associated with AI technologies for DBT screening both as a standalone screening tool and as a second independent reader to radiologist interpretation. Our proposed study will represent the most objective and rigorous evaluation of deep learning algorithms for DBT screening interpretation in the U.S. to date. Our results will provide urgently needed evidence to inform key stakeholders including women, physicians, payers, industry partners, and policymakers regarding how to maximize the value of AI technologies for DBT screening prior to their widespread clinical t...