# Development and application of machine learning-based approaches for estimating disease risk in diverse and admixed populations.

> **NIH NIH R43** · GALATEA BIO INC · 2024 · $399,156

## Abstract

ABSTRACT
Polygenic Risk Scores (PRS) quantify the genetic component of an individual’s risk of eventually
developing a particular phenotype (generally a complex disease). They, along with more
traditional clinical risk prediction metrics, are a crucial component of future personalized
medicine protocols, by influencing the optimal timing of monitoring, preventative testing and
potential lifestyle modifications. While many PRS models have been proposed, they suffer from
two main drawbacks. Specifically, PRS models for the same disease have not been
systematically tested and evaluated for effectiveness using uniform, repeatable protocols; and
existing models are generally optimized for risk prediction in European ancestry individuals, and
are much less accurate when applied to people from other populations. This proposal will
address these shortcomings by
 1) Evaluating the performance of existing PRS for European ancestry individuals in a fair
 and consistent manner using both public (UK Biobank) and proprietary data
 2) Generating an improved ancestry reference panel that includes deeper representation of
 indigenous American groups
 3) Developing novel machine learning PRS models applicable to individuals with diverse
 and admixed ancestry
 4) Implementing a combined ancestry estimation + PRS application that is commercially
 available through AWS or DNAnexus
Together, this proposed work will ensure that the benefits of improved disease risk prediction
are available to all individuals, regardless of ancestral background.

## Key facts

- **NIH application ID:** 10921442
- **Project number:** 1R43HG013628-01
- **Recipient organization:** GALATEA BIO INC
- **Principal Investigator:** Carlos Daniel Bustamante
- **Activity code:** R43 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $399,156
- **Award type:** 1
- **Project period:** 2024-09-19 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10921442

## Citation

> US National Institutes of Health, RePORTER application 10921442, Development and application of machine learning-based approaches for estimating disease risk in diverse and admixed populations. (1R43HG013628-01). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10921442. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
