# Informatics Infrastructure and Bioinformatics Analysis

> **NIH NIH P01** · UNIVERSITY OF CALIFORNIA, SAN FRANCISCO · 2023 · $434,727

## Abstract

This program project will involve high-throughput experiments, including whole genome sequencing (WGS),
RNA sequence profiling on cell populations (RNA-seq), perturbation RNA profiling (Perturb-seq), and CRISPR
screens. These studies will generate substantial quantities of data requiring secure, reliable storage and
computational analysis. Data derived from human research participants require a data security level suitable
for their sensitive nature. Secure and sufficient computational and information resources are crucial for
facilitating efficient, consistent, and reproducible analyses for all projects. Storing all data from the program
project centrally enables data provenance tracking, data sharing, and integration. Shared computing supports
consistent analysis pipelines, with a single resource responsible for maintaining current versions of databases
and software. A bioinformatics specialist performing routine analyses will allow researchers in each project to
focus on novel research endeavors. We plan to support the three project components and the three other
cores with the following specific aims:
Aim 1. Provide informatics infrastructure to enable reproducible and secure data analyses. Core B will
provide a secure and reliable informatics infrastructure to support foundational data analyses for the projects. A
key component of the computing environment will be a server with sufficient compute cycles and memory for
both routine analyses and novel research activities of the projects. This secure environment will include online
disk storage for all raw and processed project data and analysis code. This core will implement a well-designed
backup system including regular onsite and offsite backup of project data. Software packages and necessary
public and licensed databases will be deployed and kept current. The entire compute environment will be
secured by a modern firewall, with systems accessed via VPN. A system administrator will maintain the
computing system, including backup, and will support users from the collaborating institutes.
Aim 2: Perform routine bioinformatics analyses for each Project. A bioinformatics specialist in Core B will
provide basic computational analyses for genome and transcriptome data. Statistical researchers in Core B will
develop innovative robust statistical methods for Perturb-seq in Project 3, and creative pipelines for all studies.
To enable consistent and reproducible analyses for all projects, this core will deploy a robust software
infrastructure to manage all data, record its provenance, and track all results. The core will perform
preprocessing, quality control, annotation, and other routine analysis steps for WGS, transcriptome
sequencing, CRISPR screens, and characterization of specificity and efficiency of gene correction. Core B will
also help maintain a public resource of genes and regulatory regions related to T cell deficiencies. Core B will
facilitate researchers in all projects and cores with ...

## Key facts

- **NIH application ID:** 10691236
- **Project number:** 5P01AI138962-04
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
- **Principal Investigator:** Steven E Brenner
- **Activity code:** P01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $434,727
- **Award type:** 5
- **Project period:** 2020-09-08 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10691236

## Citation

> US National Institutes of Health, RePORTER application 10691236, Informatics Infrastructure and Bioinformatics Analysis (5P01AI138962-04). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10691236. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
