# Improving PGS Prediction for Underrepresented Groups Through Transfer Learning

> **NIH NIH R01** · HENRY FORD HEALTH + MICHIGAN STATE UNIVERSITY HEALTH SCIENCES · 2024 · $478,037

## Abstract

In the last two decades, thousands of Genome-Wide Association Studies (GWAS) have been published.
Increasingly, the findings reported by these studies inform the development of Polygenic Scores (PGS) that can
be used to predict phenotypes and disease risk. The Polygenic Scores Catalog includes more than 3,700 PGS.
However, the overwhelming majority of the PGS were derived using data from Europeans and have poor
predictive performance when used to predict phenotypes of individuals of non-European ancestry.
 Transfer Learning (TL) is a technique by which knowledge gained in one data set is used to improve the
model’s performance in another data set. Our overarching goal is to develop novel TL algorithms to improve the
prediction accuracy of PGS for ancestry groups underrepresented in genomics research.
 To achieve this, we propose three specific aims. Our first Aim is to Develop and Benchmark Novel
Penalized and Bayesian methods for PGS development using Transfer Learning. The first method that we
proposed is a Penalized Regression inducing shrinkage of estimates towards external estimates of SNP effects
(e.g., SNP effects derived from Europeans). We develop coordinate descent algorithms to fit Penalized
Regressions using Ridge, Lasso, and Elastic Net penalties. The second model we propose is a Bayesian
Regression with a mixture prior that uses external estimates as prior means in a model that can automatically
learn for each SNP whether to transfer knowledge from the exterior estimator or not and the strength of borrowing
information. We present preliminary results (using data from the UK-Biobank and AoU) that demonstrate the
potential of the proposed methods. Our Aim 1 research will deliver efficient open-source software to develop
PGS using TL and extensive benchmarks using data from the UK-Biobank, the All of Us (AoU), and three US
cohorts (the ARIC, REGARDS, and the HCSL/SOL cohorts).
 Recent studies suggest that a sizable fraction of the loss of accuracy (LOA) in cross-ancestry prediction is
attributable to genome differentiation (i.e., between ancestries differences in allele frequencies and Linkage
Disequilibrium). We hypothesize that genome differentiation (and thus the portability of local PGS) varies
substantially over the genome. Therefore, in Aim 2, we propose to Develop and Validate Maps of the Relative
Accuracy (RA) of European-derived PGS when used to predict phenotypes of African Americans and Latinos.
 Finally, Aim 3 focuses on Integrating Relative Accuracy Maps developed in Aim 2 into Transfer Learning
Algorithms that can achieve strong transferring of knowledge for genomic regions that exhibit limited genome
differentiation between populations (i.e., high predicted RA) and weaker TL for regions with low predicted RA.
We propose strategies for this in Penalized and the Bayesian models developed in Aim 1. Further, we also offer
an approach to use the Bayesian model that we will develop in Aim 3 to leverage sex-by-ancestry differen...

## Key facts

- **NIH application ID:** 10983102
- **Project number:** 1R01HG013794-01
- **Recipient organization:** HENRY FORD HEALTH + MICHIGAN STATE UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Gustavo de los Campos
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $478,037
- **Award type:** 1
- **Project period:** 2024-09-19 → 2025-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10983102

## Citation

> US National Institutes of Health, RePORTER application 10983102, Improving PGS Prediction for Underrepresented Groups Through Transfer Learning (1R01HG013794-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10983102. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*