# Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness

> **NIH NIH R01** · JOHNS HOPKINS UNIVERSITY · 2024 · $651,982

## Abstract

Project Summary
The widespread use of RNA sequencing technology over the past decade has allowed scientists to discover a far
larger and richer repertoire of genes and transcripts encoded by the human genome than were known just a
decade ago. At least 90% of human genes have multiple isoforms, including splicing variants, alternative sites
of transcription initiation and termination, exon skipping events, and more. The number of human transcripts
in standard gene databases has grown enormously, from ~40,000 in the late 2000s to over 200,000 today, but
it is still likely far from complete. Our previous work using exon-exon splice junctions and other fragmentary
transcripts has demonstrated the clinical relevance of unannotated but expressed genes in the human brain,
including associations with schizophrenia and its genetic risk. This project will attempt to discover and
characterize novel gene isoforms collected from both healthy and diseased brains, using the latest
computational methods for transcriptome assembly and an extensive collection of brain RNA-seq datasets. The
project is organized into three aims: first, we will develop new algorithms designed to assemble RNA-seq data
from samples that have been sequenced using ribosomal RNA depletion, a technique that is widely used in
human brain studies but that is not used in most other RNA-seq experiments, which instead use polyA+
enrichment. We will implement these methods as extensions to the HISAT and StringTie systems for RNA-seq
alignment and assembly, both of which were developed in the PI's and co-PI's labs. We will then apply these
improved methods to thousands of publicly available RNA-seq samples from human brain tissue to create a
new "CHESS-BRAIN" (Comprehensive Human Expressed Sequences in Brain) gene annotation database. This
effort will also determine which transcripts are tissue-specific and brain-region specific; i.e., expressed at
significantly higher or lower levels in brain tissues and in various brain regions as compared to other tissues. In
the second aim, we will use these methods to quantify gene expression levels in hundreds of post-mortem brain
RNA-seq samples from subjects diagnosed with schizophrenia (SCZD), major depression (MDD), bipolar
disorder (BPD), autism spectrum disorder (ASD), and post-traumatic stress disorder (PTSD), whom we will
compare to matched controls to identify the contribution of unannotated transcription in these disorders. In
our third aim we will perform expression quantitative trait loci (eQTL) mapping across the entire CHESS-brain
dataset, both within and across brain regions and diagnoses, to identify genetic regulation of unannotated
transcripts, including both coding and noncoding transcripts. This analysis will identify genes and transcripts
whose expression levels change significantly in different tissues and diseases. We will combine these results to
identify novel transcripts associated with genetic risk for each of the psychiatric diso...

## Key facts

- **NIH application ID:** 10761728
- **Project number:** 5R01MH123567-04
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** Steven L. Salzberg
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $651,982
- **Award type:** 5
- **Project period:** 2021-03-02 → 2025-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10761728

## Citation

> US National Institutes of Health, RePORTER application 10761728, Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness (5R01MH123567-04). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/10761728. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
