Core C. Bioinformatics and data sharing.

NIH RePORTER · NIH · P01 · $350,426 · view on reporter.nih.gov ↗

Abstract

Core C, Abstract Even though it has been over 20 years since the genome of Mycobacterium tuberculosis (Mtb) was first sequenced, the functions of over half the genes in the genome remain unknown (or unclear). Additional sources of information (beyond homology) are needed to generate a better understanding of the specific biological roles Mtb genes play in basic cellular processes and infection. In this program project, four disease-relevant pathways will be studied: metabolic response to acidic stress, biotin biosynthesis, RNA processing, and cell division. The four experimental methods that the Projects will rely heavily on are transposon sequencing (TnSeq; essentiality), RNAseq (transcriptomics), Activity-Based Metabolomic Profiling (ABMP), and CRISPR interference (CRISPRi). TnSeq and RNAseq will be applied to knockout or knockdown strains of target genes to assess the intracellular responses to these perturbations and infer novel gene functions and pathway associations. Sequencing of transposon insertion libraries (TnSeq) provides a powerful method for probing the functions and relationships among genes through conditional essentiality and genetic interactions. CRISPRi will be used to generate knockdown strains to validate phenotypes predicted by TnSeq. Transcriptomic data yields information on co- expression, which can be used to infer functional relationships, e.g. through regulatory networks. ABMP provides direct information on gene functions through changes in metabolite concentrations (e.g. possible substrates or products) when purified protein is incubated in lysate. The role of the Bioinformatics and Data Sharing Core is to assist the Projects with rigorous statistical analyses of these data. Specifically, in Activity 1 the Core will conduct bioinformatic analysis of various 'omics datatypes to identify novel genes in the target pathways and define their functional roles. In Activity 2 the Core will apply Machine Learning algorithms to integrate these diverse 'omics datatypes and build predictive models that can be used to identify new genes in these pathways and predict their functions. Finally, under Activity 3 the Bioinformatics and Data Sharing Core will serve as a centralized conduit for data sharing; including depositing datasets in appropriate public repositories, and posting data on an Mtb-dedicated website they have developed.

Key facts

NIH application ID
10024703
Project number
1P01AI143575-01A1
Recipient
WEILL MEDICAL COLL OF CORNELL UNIV
Principal Investigator
SABINE EHRT
Activity code
P01
Funding institute
NIH
Fiscal year
2020
Award amount
$350,426
Award type
1
Project period
— → —