Under-diagnosis occurs when an individual living with a disease condition has not received a diagnosis. Reasons for under-diagnosis are often complex and context specific, and the extent may vary across sensible population subgroups leading to disparity in care. Electronic Health Records (EHRs) contain a wealth of health information for patients, and the diagnosed and under-diagnosed patients may bear similarity in their EHR profiles, which differ from those condition-free. Therefore, EHRs provide a unique opportunity to address under-diagnosis in the standard healthcare setting. Full exploitation of such opportunity is challenging, however, because of the very fact that under-diagnosed patients are embedded in the large number of condition-free patients. Noting that patients who have been diagnosed with the condition can be identified from EHRs, we propose that EHR data, when enriched with additional disease labels from a small scale targeted screening, allows development of data-driven approaches to identifying under-diagnosed patients and assessing disparity in under-diagnosis. To this end, we will develop an arsenal of statistical and machine learning methods and accompanying software tools to address under-diagnosis. Our methods enable (1) a risk-based approach to identifying patients in EHRs who most possibly miss the diagnosis (Aim 1); (2) unbiased comparison between diagnosed and under-diagnosed patients to understand disparity in under-diagnosis (Aim 2); and (3) leveraging of existing models and targeted screening data to address under-diagnosis in a new clinical setting. We will apply the developed methods to address under-diagnosis in Primary Aldosteronism and Familial Hypercholesterolemia using data from Penn Medicine and VA EHRs.