Use of a Machine Learning Approach to Impute Gene Expression in African Americans

NIH RePORTER · NIH · R21 · $200,000 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Multi-omics data has been invaluable in understanding the potential mechanisms behind SNP associations. Using paired genomic and transcriptomic data allows investigators to determine the tissue specific effects of non-coding variation. However, most of this type of data exists for mostly European ancestry populations. Linear models have been developed which that can impute gene expression from genotype data  mostly created from the GTEx resource. This resource contains paired genotype and gene expression data on 44 human tissues. Unfortunately, these models are built mostly on European data; they do not perform as well on African American (AA) cohorts. To alleviate this disparity in both knowledge and data we are proposing to use both or own African American paired data as well as public African American data to create linear and machine learning models to impute gene expression. We will then assess the utility of these models in predicting the risk on venous thromboembolism in our ACCOuNT cohort. By building on our current knowledge of transcriptome imputation, we will be advancing these methods to understudies admixed populations.

Key facts

NIH application ID: 10426288
Project number: 5R21HG011695-02
Recipient: NORTHWESTERN UNIVERSITY
Principal Investigator: Minoli A Perera
Activity code: R21
Funding institute: NIH
Fiscal year: 2022
Award amount: $200,000
Award type: 5
Project period: 2021-06-10 → 2024-05-31