The UCSC Genome Browser

NIH RePORTER · NIH · U41 · $1,395,912 · view on reporter.nih.gov ↗

Abstract

This component describes our plans to enhance the interconnectivity of the UCSC Genome Browser and related databases with other computational groups and tools in the scientific community, maintain the high quality of the Genome Browser software and data, and provide a robust computing environment capable of supporting our developers and users. We propose three primary ways in which we plan to develop, use, and extend the data exchange standards that make it easier for other bioinformaticians to both use our data and make their own data available in the Genome Browser. We plan to further develop our widely adopted track and assembly hub systems that group together genomics files in an organized fashion and label them for browser display, in particular by extending the representation of metadata (such as biosample sources and treatments) in hubs, and expanding our search capabilities. We will continue to work with ontology groups to incorporate their controlled vocabularies into relevant fields of our metadata. We will closely collaborate with the Global Alliance for Genomics and Health (GA4GH) project to ensure that their APIs are sufficiently flexible to express our data sets and to develop a JSON-based web services API to our databases. We plan to maintain and improve the quality of our software through the continued use of good engineering practices, including the appropriate use of functional programming approaches to minimize code side effects and maximize parallel processing potential. We will continue to employ incremental, object-oriented, modular programming techniques and unit tests to maintain code quality, as well as our weekly paired-review process that ensures a thorough review of new code and helps distribute knowledge of the code base throughout our organization. Augmenting our engineering practices, we will continue to maintain a separate quality assurance group that applies a combination of automated and manual testing to check the quality of the software and data released on our website. The Genome Browser production and development environments are supported by several mid-range server- grade computers and a variety of storage subsystems chosen with good price/performance ratios in mind. We plan to reconfigure our system to reduce single points of failure and increase parallelism, and will reduce our need for a large compute cluster by making increased use of the cloud for large bursts of computation, such as that associated with our multiple genome alignment pipeline.

Key facts

NIH application ID: 10236410
Project number: 5U41HG002371-22
Recipient: UNIVERSITY OF CALIFORNIA SANTA CRUZ
Principal Investigator: William James Kent
Activity code: U41
Funding institute: NIH
Fiscal year: 2021
Award amount: $1,395,912
Award type: 5
Project period: 2001-07-12 → 2022-06-30