A Unified High Performance Web Service for Systems Genetics and Precision Medicine

NIH RePORTER · NIH · R01 · $476,764 · view on reporter.nih.gov ↗

Abstract

We are developing and improving powerful statistical and genetic tools to analyze and integrate massive omics data sets jointly with information on disease risk and severity. This work will enable far better use and re-use of complex and massive omics data sets and software by a wide community of users—ranging from students, researchers, and clinical scientists to expert data scientists and statisticians. We are building modular high- performance computational resources as part of a web services framework called GeneNetwork 2 (GN2). GN2 provides efficient data uploading and access and a suite of QC and analysis code that can be used or adapted for any species. Code is written in Python, C++, and R, and is supported by a relational database (MySQL) that incorporates the largest coherent collection of expression quantitative trait locus (eQTL) data. GN2 is optimized to handle a new generation of complex genetic crosses, including heterogeneous stock, hybrid diversity panels, GWAS cohorts, and sets of recombinant inbred strains such as the BXD and Collaborative Cross. GN2 includes new code for comparative and translational analysis of eQTL data sets and network graphs. In this grant we extend GN2 in four specific ways: far more capable data entry and export APIs and workflows, QC, and simulation routines (Aim 1); new high performance tools for the analysis of complex cross populations, comparative and translational analysis of systems genetics data sets (Aim 2), a new plug-in application programming interface (API) architecture with backend use of GPU web service systems (Aim 3), and statistical methods for correlated high dimensional data and predictive Bayesian modelling (Aim 4). We anticipate that this open and scalable architecture and modular code will become a core resource for both molecular biologists and data scientists, particularly those working in predictive modeling and precision medicine. All members of our team work closely with the systems genetics community and are training the next generation of young scientists interested in scalable integrative models of disease risk and treatment.

Key facts

NIH application ID
9904711
Project number
5R01GM123489-04
Recipient
UNIVERSITY OF TENNESSEE HEALTH SCI CTR
Principal Investigator
SAUNAK SEN
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$476,764
Award type
5
Project period
2017-04-15 → 2021-03-31