Analytical Infrastructure for Multiple Sample Single Cell Genomic Data

NIH RePORTER · NIH · R01 · $361,069 · view on reporter.nih.gov ↗

Abstract

Project Summary Single cell genomic technologies are fast evolving technologies capable of measuring a variety of omics data modalities in individual cells. With a wide range of applications such as discovery of new cell types, mapping temporal and spatial cellular programs in development, tissues and organs, and studying cell-cell interactions in tumor microenvironments, etc., these technologies are rapidly transforming biomedical research. Early single cell genomic studies are primarily focused on characterizing how cells are different in a sample. Recently, however, single cell studies increasingly produce a large number of biological or patient samples, creating new opportuni- ties and growing demands for studying how samples are different and how omics programs are associated with sample phenotype. Despite these new opportunities and demands, the sample-level heterogeneity represents an extra layer of complexity not fully dealt with by existing data analysis methods. This proposal aims to develop new analytical methods and software tools to address three open challenges in the unsupervised analyses of multi-sample single cell genomic data at population scale, across data modalities, and across species. Our Aim 1 will address the challenge that longitudinal data with densely sampled time points from the same patient are difficult to obtain for studying dynamic cellular programs along disease progression. We will develop an alternative strategy and analytical method, sample trajectory analysis, to infer temporal pro- gression of sample phenotype and its associated cellular programs using cross-sectionally collected samples. Our Aim 2 will tackle the problem of integrating data across samples and modalities to allow analyses of sample similarities and differences. We will develop a systematic sample harmonization method to address challenges in harmonizing data with unmatched features, choosing optimal feature type and resolution, and removing unwanted technical noises while keeping meaningful biological variation. We will also create an analytical framework to sup- port systematic unsupervised analysis of sample heterogeneity. Our Aim 3 will develop a solution to comparative analysis of multi-sample single cell data across species which is important for identifying conserved and diverged biological processes between human and animal models. Such knowledge is fundamental for designing and interpreting animal model experiments for studying human diseases. Upon completion of this proposal, we will deliver our methods through open-source software tools. These tools will be widely useful for analyzing single cell genomic data with multiple samples. By addressing several major challenges in single cell genomic data analyses, our new methods and tools will help unleash the full potential of single cell genomic technologies for biomedical research and can have a major impact on advancing our understanding of both basic biology and human diseases.

Key facts

NIH application ID
10989329
Project number
1R01HG013409-01A1
Recipient
JOHNS HOPKINS UNIVERSITY
Principal Investigator
Hongkai Ji
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$361,069
Award type
1
Project period
2024-09-01 → 2028-06-30