# Bioinformatics and Biostatistics Core

> **NIH NIH P30** · YALE UNIVERSITY · 2022 · $331,208

## Abstract

The Biostatistics and Bioinformatics Core (BBC) supports statistical, bioinformatic and computational
needs of the Discovery and Targeted Proteomics Cores, as well as Center Investigators, their postdoctoral
associates and students, and Pilot Grant awardees. The BBC has four inter-related Specific Aims: 1)
Biostatistics; 2) Bioinformatics; 3) High Performance Computing; 4) Training and Education. In Aim 1,
we will provide statistical guidance on experimental design and data analysis, including sample quality
assessment, and exploratory analysis for a wide range of types of proteomics data sets; continue to develop
and add more features/functions to ProteomicsBrowser, a proteomics data analysis and visualization tool
developed by the BBC to assist Center users in better interpreting complex proteomics data; develop and
implement novel statistical methods to impute missing information in proteomics data; and develop and
implement an online tool for proteomics data preprocessing, including data normalization, batch effect
correction, and missing data imputation. In Aim 2, we will provide advanced bioinformatics software and
approaches to assist Center investigators and Pilot Grant awardees in fully interpreting their comparative
protein and protein post-translational modification profiling data; we will leverage information from single-cell
RNA-seq by incorporating stochastic expression at the cell-type-level into an analytic framework to deconvolve
tissue-level transcriptomics (RNA-seq) into fractions of constituent cell types for individual samples, and
identify genes and cell types showing significant discrepancies between RNA and protein levels; and develop a
unified computational framework for the detection of allele-specific peptides and allele-specific events from
existing whole-genome sequencing, whole-transcriptome sequencing, and proteomics data generated from
post-mortem human brain samples. In Aim 3, we will provide continued support for large-scale peptide
sequence alignment and support novel pipelines to integrate genomic, transcriptomic, and proteomic datasets;
work closely with the bioinformatics and biostatistics teams to help benchmark, scale, optimize, and speed up
computing tasks involving large-scale data analyses and database queries; and explore alternatives to
traditional high performance computing environments such as container systems and private cloud computing.
In Aim 4, we will provide training and education in biostatistics, bioinformatics, database and high performance
computing through interaction and collaboration with the Center investigators, including working closely with
the Yale Medical Library Bioinformatics Support Program and other Yale organizations.

## Key facts

- **NIH application ID:** 10408094
- **Project number:** 5P30DA018343-18
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** ANGUS C. NAIRN
- **Activity code:** P30 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $331,208
- **Award type:** 5
- **Project period:** 2004-07-01 → 2025-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10408094

## Citation

> US National Institutes of Health, RePORTER application 10408094, Bioinformatics and Biostatistics Core (5P30DA018343-18). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10408094. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
