# Identifying disease-relevant cell types by integrating genetic and functional genomics data

> **NIH NIH DP5** · BROAD INSTITUTE, INC. · 2021 · $445,000

## Abstract

Project Summary: Large-scale datasets such as those generated by GTEx, the Roadmap Epigenomics
Consortium, and the ENCODE project are valuable new resources for understanding the genetic basis of
disease. We now have data on gene expression and many functional elements such as histone modifications
and DNase-I Hypersensitivity Sites (DHS) in a variety of cell types and tissues in humans. Analysis of these
datasets, together with data from genome-wide association studies (GWAS), has the potential to lead to
breakthroughs in our understanding of the causes of disease. While statistical and computational methods for
integrative analysis of these datasets with GWAS datasets have already led to many interesting advances,
there is a great need for further methodological progress to translate this abundance of data into concrete
mechanistic insights. We will focus on the fundamental problem of identifying disease-relevant cell types and
tissues via integrative analysis of these datasets. Our work is motivated by the fact that the substantial majority
of disease heritability lies in non-coding regions, and regulatory elements often exhibit strong cell-type
specificity. Thus, to understand the mechanistic consequences of genetic variation by either computational or
experimental means, we need to identify the cell types and tissues in which the relevant processes are
occurring. While these are known for some complex phenotypes, they are uncertain or unknown for many; for
example, while it is known that schizophrenia is a brain disease, recent evidence indicates that the complement
system is involved in schizophrenia pathogenesis through its role in synaptic pruning, and the relevant cell
types remain unresolved. Despite the importance of this problem, developing a powerful method for
identification of cell types and tissues using GWAS data remains open. Our approach will have two
components: first, we will develop methods for using genetic data to assess whether a given genomic
annotation—i.e. a subset of the genome—is important for the phenotype we are studying. We will build on a
method we previously developed for enrichment analysis that powerfully leverages polygenic signal,
extending it so that it can analyze rare variant data, combine signal from multiple sources of data about a
single cell type/tissue, and investigate shared cell types/tissues across traits. Second, we will use gene
expression data and functional genomics data to construct, for each candidate cell type/tissue, genomic
annotations that are maximally informative about cell-type specific activity. We will begin by using specifically
expressed genes, which have not been fully leveraged in this context, and we will also develop new methods
for constructing maximally informative genomic annotations from chromatin data like that available from
Roadmap. We will continue our practice of releasing open-source, user-friendly software and data. Together,
our new methods and annotations will all...

## Key facts

- **NIH application ID:** 10247697
- **Project number:** 5DP5OD024582-05
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Hilary Finucane
- **Activity code:** DP5 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $445,000
- **Award type:** 5
- **Project period:** 2017-09-01 → 2022-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10247697

## Citation

> US National Institutes of Health, RePORTER application 10247697, Identifying disease-relevant cell types by integrating genetic and functional genomics data (5DP5OD024582-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10247697. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*