PROJECT SUMMARY Genome-wide association studies (GWAS) have identified numerous genetic loci associated with almost all complex human diseases. Much of this success, particularly the accelerated findings in recent years, is credited to the development of deeply phenotyped population biobanks with matched genomic data. However, a crucial limitation of these population biobanks is the often-insufficient number of disease cases for late-life health outcomes, which is why the introduction of the concept of GWAS-by-proxy (GWAX) served as a landmark in the field. The GWAX study design is based on a simple idea – although biobank participants may not have their own diagnosis on late-life disease outcomes, they provide such diagnosis of their parents through the family health history survey; they also (indirectly) provide parental genetic data, as their biological child. Since this study, GWAX has been widely used in genetic studies for many diseases, but particularly frequently for neurodegenerative diseases. Every recent Alzheimer’s disease (AD) GWAS performed meta-analysis to combined case-control associations with GWAX proxy associations to boost sample size and statistical power. However, methodological issues in GWAX and the quality of its association results have not been carefully investigated. We demonstrate pervasive biases in current GWAX approaches, causing substantial divergence of GWAS and GWAX results. In addition, we demonstrate that education is an important social factor at the center of many of these biases. Since cognition is such a crucial marker for AD, biases caused by education/cognition become particularly important in AD genetics research and will give completely misleading results if not handled properly. Our proposal takes advantage of extensive family health history data available in the AllofUs research program and recent statistical advances developed by our investigator team in decomposing social genetic effects with summary statistics of multi-generational GWAS. We aim to expand these methods to rigorously and comprehensively characterize the biases in current GWAX results. Our central hypothesis is that GWAX associations based on family health history as proxy for disease phenotypes are substantially affected by survival bias and non-random over- and under-report of family member’s illness, and will lead to erroneous results and conclusions for analyses that naively combine these associations with case-control GWAS results. Successful completion of this proposal will improve scientific understanding of the genetic underpinnings of family health history, shed important light on the design and analysis of mid-aged biobank cohorts, and provide novel analytical strategies for future genetic studies leveraging family health history data in population biobanks.