# Methods for Analysis of Genomic Data with Auxiliary Information

> **NIH NIH R21** · MAYO CLINIC ROCHESTER · 2022 · $229,191

## Abstract

Project Summary
The broad and long-term objective of this project concerns the development of novel statistical methods and
bioinformatics tools for genomic data analytics, with application to individualized medicine. Two goals of
genomic data analysis in medicine are to identify genomic biomarkers of clinical outcomes and to build
genomic biomarker-based predictive models for disease prevention, diagnosis and prognosis. Both of these
tasks face the challenge of insufficient statistical power. Functional genomics studies have produced an
enormous amount of data about the structure and function of the genomic elements. Integrating such auxiliary
data in the analysis of genomic data could potentially increase the power and interpretability of the analysis.
However, methods for integration of auxiliary data in genomic data analysis remain under-developed. This
proposal aims to develop novel statistical methods for auxiliary data integration for three fundamental statistical
problems. Aim 1 focuses on developing a covariate-adaptive family-wise error rate control procedure
integrating auxiliary data. The procedure improves over existing procedures by accounting for the auxiliary
information and the p-value distributional information simultaneously while the existing procedures do not use
the p-value distributional information. Aim 2 focuses on developing a structure-adaptive high-dimensional
regression model for flexible integration of auxiliary data into prediction. The method translates the auxiliary
information into different penalization strengths for the regression coefficients. Since it imposes a “soft”
constraint on the regression coefficients, it is expected to be more robust to mis-specified or less informative
auxiliary information. Aim 3 proposes a two-stage false discovery rate control (FDR) procedure for more
powerful confounder adjustment in genomic association analysis. Genomic data are subject to various
confounding effects due to demographic, environmental, biological and technical factors. Confounder
adjustment substantially reduces statistical power. The two-stage approach improves the power of traditional
adjusted analyses by using the unadjusted test statistics as auxiliary information to filter out less promising
features and performing the FDR control in the remaining. Aim 4 will develop user-friendly and efficient
software packages so the community can benefit maximally from methodological and scientific advances
resulting from this application. The proposed methods will be evaluated using simulations, and more
importantly, applications to several ongoing studies at the Center of Individualized Medicine at Mayo Clinic.
The proposed quantitative methods and open-source platform will contribute to genomic biomarker discovery
and genomic biomarker-based predictive medicine. All methods and bioinformatics tools developed under this
grant will be made available free of charge to interested researchers and the public.

## Key facts

- **NIH application ID:** 10415152
- **Project number:** 5R21HG011662-02
- **Recipient organization:** MAYO CLINIC ROCHESTER
- **Principal Investigator:** Jun Chen
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $229,191
- **Award type:** 5
- **Project period:** 2021-06-01 → 2024-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10415152

## Citation

> US National Institutes of Health, RePORTER application 10415152, Methods for Analysis of Genomic Data with Auxiliary Information (5R21HG011662-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10415152. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*