An estimated 6.2 million Americans aged 65 and older are living with Alzheimer's Disease and its Related Dementias (AD/ADRD) in 2022. Of these, two thirds are women. Blacks and Hispanics have been shown to have a higher risk of AD/ADRD compared with whites. The vast majority of diagnosis of AD/ADRD occurs in non- specialty settings such as primary care. But by 2019, only 16% of seniors were regularly screened for cognitive impairment in the primary care setting. Late diagnosis deprives patients and their families of the opportunity to receive anticipatory guidance, participate in clinical trials, or benefit from any potential disease-modifying therapy. Leveraging data sources such as MRI imaging and electronic health records (EHR) can potentially allow scalable monitoring of cognitive health and early detection of AD/ADRD. However, existing tools are built with mostly white educated populations without significant comorbidities. Patients represented in real-world clinics are more diverse and medically complex. However, working with such data requires solving several core machine learning challenges. Here, we propose a set of novel methods that enable us to use large real-world clinical multi-modal datasets for the purpose of building robust, unbiased, fair and accurate models for early AD/ADRD detection for diverse populations, with an emphasis on under-represented groups. Specifically, we propose to develop novel self-supervised learning techniques that learn robust representations from large unlabeled datasets which can then be used to design algorithmically fair models. Our proposal offers new objective functions to leverage multi-modality (pairing of T1, FLAIR and PET MRI images and EHR data) as an asset to better train models. This work can extend beyond AD/ADRD diagnosis to diseases which have imaging and clinical biomarkers.