Methods for Analysis of Genomic Data with Auxiliary Information

NIH RePORTER · NIH · R21 · $229,191 · view on reporter.nih.gov ↗

Abstract

Project Summary The broad and long-term objective of this project concerns the development of novel statistical methods and bioinformatics tools for genomic data analytics, with application to individualized medicine. Two goals of genomic data analysis in medicine are to identify genomic biomarkers of clinical outcomes and to build genomic biomarker-based predictive models for disease prevention, diagnosis and prognosis. Both of these tasks face the challenge of insufficient statistical power. Functional genomics studies have produced an enormous amount of data about the structure and function of the genomic elements. Integrating such auxiliary data in the analysis of genomic data could potentially increase the power and interpretability of the analysis. However, methods for integration of auxiliary data in genomic data analysis remain under-developed. This proposal aims to develop novel statistical methods for auxiliary data integration for three fundamental statistical problems. Aim 1 focuses on developing a covariate-adaptive family-wise error rate control procedure integrating auxiliary data. The procedure improves over existing procedures by accounting for the auxiliary information and the p-value distributional information simultaneously while the existing procedures do not use the p-value distributional information. Aim 2 focuses on developing a structure-adaptive high-dimensional regression model for flexible integration of auxiliary data into prediction. The method translates the auxiliary information into different penalization strengths for the regression coefficients. Since it imposes a “soft” constraint on the regression coefficients, it is expected to be more robust to mis-specified or less informative auxiliary information. Aim 3 proposes a two-stage false discovery rate control (FDR) procedure for more powerful confounder adjustment in genomic association analysis. Genomic data are subject to various confounding effects due to demographic, environmental, biological and technical factors. Confounder adjustment substantially reduces statistical power. The two-stage approach improves the power of traditional adjusted analyses by using the unadjusted test statistics as auxiliary information to filter out less promising features and performing the FDR control in the remaining. Aim 4 will develop user-friendly and efficient software packages so the community can benefit maximally from methodological and scientific advances resulting from this application. The proposed methods will be evaluated using simulations, and more importantly, applications to several ongoing studies at the Center of Individualized Medicine at Mayo Clinic. The proposed quantitative methods and open-source platform will contribute to genomic biomarker discovery and genomic biomarker-based predictive medicine. All methods and bioinformatics tools developed under this grant will be made available free of charge to interested researchers and the public.

Key facts

NIH application ID: 10415152
Project number: 5R21HG011662-02
Recipient: MAYO CLINIC ROCHESTER
Principal Investigator: Jun Chen
Activity code: R21
Funding institute: NIH
Fiscal year: 2022
Award amount: $229,191
Award type: 5
Project period: 2021-06-01 → 2024-04-30