Physiological characteristics relevant to breast screening, including breast density and tumor markers, exhibit statistically significant variations across different racial and ethnic groups. However, the current unavailability of this metadata in mammogram screening datasets poses a significant challenge. This gap can impair the performance of AI algorithms, which often struggle with out-of-distribution training data. Our goal for this project is to create a comprehensive and longitudinally linkable breast cancer screening reference dataset encompassing race/ethnicity demographics, pathology and genotyping reports, and multimodal imaging follow-ups. Such a reference dataset enables software-as-a-medical-device manufacturers to better train and evaluate their breast cancer screening algorithms.