# SOFTWARE FOR LARGE-SCALE INFERENCE OF THE GENETICS OF LIFESTYLE MEASURES, BIOMARKERS, AND COMMON AND RARE DISEASES

> **NIH NIH R01** · STANFORD UNIVERSITY · 2022 · $392,500

## Abstract

Large-scale population biobanks around the world, the disease focused NHGRI Genome Sequencing Program
(GSP), and the United States’ All of Us Precision Medicine Initiative project will generate massive genomic
datasets combined with disease outcomes, and other health measurements. These genomic studies will
identify genomic variants relevant to health and disease. However, their association in the context of all
possible associations identified will remain unclear if the data are separately analyzed. There is a growing
recognition that most traits are polygenic. In addition, it is increasingly appreciated that pleiotropy is pervasive.
Due to privacy concerns, it is challenging to share all possible genotype and phenotype data. Methods that can
perform inference on summary level data, e.g. p-values, effect size estimates, and frequency, will facilitate our
understanding of the genetics of human diseases and health. Here, we propose to develop software for
large-scale inference of the genetics of lifestyle measures, biomarkers, and common and rare
diseases. Achieving this goal requires expertise in medical and population genetics, statistical methods
development, and expertise in management of large-scale databases. The project has three main objectives.
First, we will create Global Biobank Engine: a powerful, interactive web platform for inference of the
genetics of lifestyle measures, biomarkers, common and rare diseases. We will expand the features by
implementing quality control visualizations and methods for flagging variants and phenotypes. We will add
tools for study design that use empirical data to estimate statistical power, and create a flexible framework for
statistical models that jointly analyze multiple phenotypes while controlling for false positive and negative
findings. Secondly, we will improve Global Biobank Engine performance, scalability, and accessibility
to facilitate future population biobanks and targeted common and rare disease. We will create a hosted,
secure, and cost-effective cloud-based community resource, and design a database system that reduces the
loading time for genetic association studies from hours to minutes and allows for streaming of statistical
algorithms directly to genetic data. Lastly, we will improve genomic interpretation, visualization, and data
sharing to dramatically increase the rate of translational discoveries by implementing novel analysis
methods. We will support new variant annotation methods and integrate coding and non-coding information,
including data from large-scale epigenomics studies, for variant and gene level inference. We will implement
new Bayesian statistical models implemented in probabilistic programming languages, sparse canonical
correlation analysis, and truncated singular value decomposition. PI Rivas and his team have ample
experience with NIH-funded consortia, and they are dedicated to the overall mission of NIH and its funded
investigators to uncover new knowledge that ...

## Key facts

- **NIH application ID:** 10440494
- **Project number:** 5R01HG010140-05
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** Anshul Kundaje
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $392,500
- **Award type:** 5
- **Project period:** 2018-09-06 → 2023-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10440494

## Citation

> US National Institutes of Health, RePORTER application 10440494, SOFTWARE FOR LARGE-SCALE INFERENCE OF THE GENETICS OF LIFESTYLE MEASURES, BIOMARKERS, AND COMMON AND RARE DISEASES (5R01HG010140-05). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10440494. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*