# Core C. Bioinformatics and data sharing.

> **NIH NIH P01** · WEILL MEDICAL COLL OF CORNELL UNIV · 2022 · $320,274

## Abstract

Core C, Abstract
 Even though it has been over 20 years since the genome of Mycobacterium tuberculosis (Mtb) was first
sequenced, the functions of over half the genes in the genome remain unknown (or unclear). Additional sources
of information (beyond homology) are needed to generate a better understanding of the specific biological roles
Mtb genes play in basic cellular processes and infection. In this program project, four disease-relevant pathways
will be studied: metabolic response to acidic stress, biotin biosynthesis, RNA processing, and cell division. The
four experimental methods that the Projects will rely heavily on are transposon sequencing (TnSeq; essentiality),
RNAseq (transcriptomics), Activity-Based Metabolomic Profiling (ABMP), and CRISPR interference (CRISPRi).
TnSeq and RNAseq will be applied to knockout or knockdown strains of target genes to assess the intracellular
responses to these perturbations and infer novel gene functions and pathway associations. Sequencing of
transposon insertion libraries (TnSeq) provides a powerful method for probing the functions and relationships
among genes through conditional essentiality and genetic interactions. CRISPRi will be used to generate
knockdown strains to validate phenotypes predicted by TnSeq. Transcriptomic data yields information on co-
expression, which can be used to infer functional relationships, e.g. through regulatory networks. ABMP provides
direct information on gene functions through changes in metabolite concentrations (e.g. possible substrates or
products) when purified protein is incubated in lysate.
 The role of the Bioinformatics and Data Sharing Core is to assist the Projects with rigorous statistical
analyses of these data. Specifically, in Activity 1 the Core will conduct bioinformatic analysis of various 'omics
datatypes to identify novel genes in the target pathways and define their functional roles. In Activity 2 the Core
will apply Machine Learning algorithms to integrate these diverse 'omics datatypes and build predictive models
that can be used to identify new genes in these pathways and predict their functions. Finally, under Activity 3 the
Bioinformatics and Data Sharing Core will serve as a centralized conduit for data sharing; including depositing
datasets in appropriate public repositories, and posting data on an Mtb-dedicated website they have developed.

## Key facts

- **NIH application ID:** 10426178
- **Project number:** 5P01AI143575-03
- **Recipient organization:** WEILL MEDICAL COLL OF CORNELL UNIV
- **Principal Investigator:** SABINE EHRT
- **Activity code:** P01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $320,274
- **Award type:** 5
- **Project period:** 2020-06-12 → 2025-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10426178

## Citation

> US National Institutes of Health, RePORTER application 10426178, Core C. Bioinformatics and data sharing. (5P01AI143575-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10426178. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
