# Bioinformatics Strategies for Genome Wide Association Studies

> **NIH NIH R01** · UNIVERSITY OF PENNSYLVANIA · 2020 · $371,859

## Abstract

The promise of precision medicine is to edit a patient’s DNA and/or administer therapeutics targeting etiologic
molecules that prevent or reverse the disease process using a tailored design. All of this happens at the level
of the individual and requires precision knowledge of that patient’s biology. In stark contrast, much of the
knowledge we possess about genomic risk factors comes from statistical measures of association from human
populations. The conceptual and practical disconnect between the populations we study and the individuals we
want to treat is a major source of confusion about how to move forward in an era driven by genome
technology. The primary goal of this proposal is to develop novel informatics methodology and software to
facilitate precision medicine by connecting population and individual genomic phenomena. We propose here a
Virtual Genomic Medicine (VGMed) workbench where clinicians can carry out thought experiments about the
treatment of individual patients using models of disease risk derived from population-level studies. This will be
accomplished by first developing a novel Genomics-guided Automated Machine Learning (GAML) algorithm for
deriving risk models from real data that is accessible to clinicians (AIM 1). We will then develop a novel
simulation approach that is able to generate artificial data that preserves the distribution of genetic effects
observed in the real data while maintaining other characteristics such as genotype frequencies (AIM 2). This
will generate open data allowing anyone to perform virtual interventions on patients derived from a population-
level risk distribution. The workbench will allow editing of individual genotypes and simulate the administration
of drugs by editing machine learning parameters in the simulation model (AIM 3). The change in risk and
disease status for the specific patient will be tracked in real time. Finally, we provide a feature in the workbench
that will allow the clinician to generate specific hypotheses about individual genetic variants that can then be
validated using integrated knowledge sources that include databases such as PubMed and ClinVar thus giving
the user immediate feedback (AIM 4). All methods and software will be provided as open-source (AIM 5).

## Key facts

- **NIH application ID:** 9886261
- **Project number:** 5R01LM010098-11
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Folkert Wouter Asselbergs
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $371,859
- **Award type:** 5
- **Project period:** 2009-09-30 → 2024-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9886261

## Citation

> US National Institutes of Health, RePORTER application 9886261, Bioinformatics Strategies for Genome Wide Association Studies (5R01LM010098-11). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9886261. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
