Deep Curation via an Integrated Whole-Cell Computational Model

NIH RePORTER · NIH · R01 · $381,923 · view on reporter.nih.gov ↗

Abstract

Research Summary/Abstract The generation of biological data is rapidly presenting us with one of the most demanding data analysis challenges the world has ever faced - not only in terms of storage and accessibility, but perhaps more critically in terms of its extensive heterogeneity and variability. In this proposal, we present a new approach to these challenges, which we call “Deep Curation”: a large-scale, integrated modeling approach to simultaneously cross-evaluate millions of heterogeneous data against themselves. The word “deep” reflects the multiple layers of curation we perform, including layers not only for data, but also for parameters derived from these data, the mathematical equations, the unified model, and the simulation output. Thus, the deeply-curated model is an invaluable tool for processing, curating and analyzing data automatically. Our proposed efforts in Deep Curation are based on a computer model of Escherichia coli that accounts for the function of roughly 40% of the well-annotated genes, and is based on an extensive set of diverse measurements compiled from thousands of reports (currently in 2nd round of review at Science). The goal of this proposal is to expand this model to enable Deep Curation of data related to growth on >100 currently-unincorporated environments. We can then assess the cross-consistency of the data sets simultaneously, as a unified whole, identifying critical areas in which datasets are not cross-consistent and therefore further experimental investigation is needed. The Significance of this proposal is that Deep Curation represents a first-in-kind quantum leap forward in our ability to exploit massively heterogeneous, variable and complex biological datasets; that it automates and accelerates transformative biomedical discovery; that we will create a bi-directional pipeline between EcoCyc, the most comprehensive database on any organism, and the most complex biological model in existence; and that whole-cell modeling is a rapidly-growing field with transformative potential as it advances towards more complex cells and groups of cells. The Innovation associated with this proposal is that Deep Curation is a brand-new and highly innovative approach that is not currently available to any other lab in the world; that the proposed work will produce a dramatically expanded whole-cell model of previously-unseen complexity; as well as novel and highly innovative modeling technology; that we include explicit curation of knowledge regarding mechanism in addition to data; and that the automated communication between the EcoCyc database and the E. coli model will dramatically expand the capacity, scope and visibility of both in a synergistic way. Our Specific Aims are: Aim 1 (Curation), build the Data and Parameter layers related to E. coli growth on diverse environments; Aim 2 (Modeling), implement the Equation, Model and Simulation layers; Aim 3 (Deep Curation), use the integrated model to cross-evaluate the un...

Key facts

NIH application ID
9986442
Project number
1R01LM013229-01A1
Recipient
STANFORD UNIVERSITY
Principal Investigator
Markus W Covert
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$381,923
Award type
1
Project period
2020-05-01 → 2024-02-29