Development and application of machine learning-based approaches for estimating disease risk in diverse and admixed populations.

NIH RePORTER · NIH · R43 · $399,156 · view on reporter.nih.gov ↗

Abstract

ABSTRACT Polygenic Risk Scores (PRS) quantify the genetic component of an individual’s risk of eventually developing a particular phenotype (generally a complex disease). They, along with more traditional clinical risk prediction metrics, are a crucial component of future personalized medicine protocols, by influencing the optimal timing of monitoring, preventative testing and potential lifestyle modifications. While many PRS models have been proposed, they suffer from two main drawbacks. Specifically, PRS models for the same disease have not been systematically tested and evaluated for effectiveness using uniform, repeatable protocols; and existing models are generally optimized for risk prediction in European ancestry individuals, and are much less accurate when applied to people from other populations. This proposal will address these shortcomings by 1) Evaluating the performance of existing PRS for European ancestry individuals in a fair and consistent manner using both public (UK Biobank) and proprietary data 2) Generating an improved ancestry reference panel that includes deeper representation of indigenous American groups 3) Developing novel machine learning PRS models applicable to individuals with diverse and admixed ancestry 4) Implementing a combined ancestry estimation + PRS application that is commercially available through AWS or DNAnexus Together, this proposed work will ensure that the benefits of improved disease risk prediction are available to all individuals, regardless of ancestral background.

Key facts

NIH application ID
10921442
Project number
1R43HG013628-01
Recipient
GALATEA BIO INC
Principal Investigator
Carlos Daniel Bustamante
Activity code
R43
Funding institute
NIH
Fiscal year
2024
Award amount
$399,156
Award type
1
Project period
2024-09-19 → 2025-08-31