Grammar-Driven Genomic Data Visualization

NIH RePORTER · NIH · R01 · $607,236 · view on reporter.nih.gov ↗

Abstract

Project Summary Our rapidly evolving understanding of how genomes function and how genomic variation inﬂuences the development and progression of diseases drive our ability to develop novel diagnostics and therapeutics. Genomic data science plays a critical role in this process, relying on the availability of computational tools for statistical and visual analysis of large-scale and complex data sets. The growing genomics workforce that relies on these tools includes scientists with a broad set of expertise and needs. Experimental scientists use tools with graphical user interfaces to interpret their data; computational biologists write pipelines or code for ad hoc analysis in interactive environments, and software developers build sophisticated data portals and other web-based tools. While a large number of genomic data visualization tools for these audiences exist, there is a lack of a uniﬁed approach that would allow a larger audience to design and implement their own interactive data visualization tools for genomic data. To address this gap, we will develop a visualization framework based on a novel grammar for interactive, scalable visualization of genome-mapped data. The visualizations deﬁned using this grammar will be interactive, responsive, and scalable. These features will be enabled by rendering the visualizations using an extension of HiGlass. HiGlass is our framework for genomic data visualization that supports multi-scale data visualization, and multiple linked views. The grammar design will be guided by a taxonomy of genomic visualizations and visual analysis tasks that comprehensively describe the space of interactive visualizations currently in use for genomic data. The grammar will support the creation of visualizations with diﬀerent genome layouts, visual encodings of data, and ﬂexible conﬁgurations of multiple linked views. Furthermore, we will incorporate a taxonomy of metadata visualizations, for example, of phenotypic data, that are frequently linked to genomic data. To create visualizations based on the proposed grammar, a JavaScript library, a Python package, an R package, and an interactive visualization editor will be developed. This editor will be web-based and have a drag-and-drop interface for data and visualization components. In addition to the genomic visualization grammar, our framework will also contain a genomic visualization recommendation system that can generate interactive visualizations based on a description of a data set and the analysis tasks that the user intends to accomplish. This will enable novices to create eﬀective visualizations without knowledge of visualization design. The recommendation system will also accelerate visual analysis for more experienced users, as the visualization design can be automated and customized. The recommendation system will be available through the R and Python packages and the interactive visualization editor. In addition to producing visualization designs using our p...

Key facts

NIH application ID: 10452031
Project number: 1R01HG011773-01A1
Recipient: HARVARD MEDICAL SCHOOL
Principal Investigator: Nils Gehlenborg
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $607,236
Award type: 1
Project period: 2022-06-15 → 2026-03-31