Computational Methods to Integrate and Interpret the Transcriptome from Single Cell and Tissue Level Data

NIH RePORTER · NIH · R01 · $505,168 · view on reporter.nih.gov ↗

Abstract

In the past decade, substantial progress has been made in discovery of genetic variants and genes associated with risk for psychiatric disorders. Altered gene expression in the brain, particularly at the cell-type-specific level, is believed to be a driving factor in conferring risk through these genetic variants. To link altered transcription to psychopathology, an immense amount of transcriptomic data is being accumulated, including single-cell and tissue level transcriptomes. Some of these samples cover critical developmental periods. An outstanding challenge is how to integrate single cell and tissue level transcriptomic data and how genetic variation alters transcription in specific cells to produce psychopathology. In this high dimensional ‘omics setting, we need powerful statistical and machine learning tools to produce integrative analyses and mesh those results with large psychiatric genetic datasets to achieve new insights. We propose to use our expertise in high dimensional statistical inference to tackle this challenge. We go beyond machine learning models that specialize in prediction, focusing instead on providing interpretable statistical inferences. We identify gene communities, defined in terms of cell type and spatiotemporal window, driving risk. With vast amounts of data comes great risk of spurious inferences based on non-rigorous analyses. On the other hand, reliable, but naïve tools can sacrifice power by not fully integrating all available information. Our overall objective to produce analytic tools that yield reliable and powerful inferences relating cell-type-specific gene expression with genetic risk factors. With these analytical tools made available to the research community, our longer-term goal is to hasten discoveries in the field and thus build the foundation from which therapeutic targets for psychiatric disorders emerge. Our objectives will be accomplished with the following Specific aims: 1) statistically rigorous methods to select cell-type markers and to estimate cell-type-specific (CTS) expression, which will facilitate downstream analyses, including CTS eQTLs from tissue; 2) modeling dynamic gene communities throughout development of cell lineages or tissue and relating them to community-based-score statistics to gain insight into the impact of genetic risk factors on psychiatric disorders; and 3) novel methods for estimating gene co-expression networks from single cell RNA-seq. This contribution is significant because it will make many transcriptomic resources more valuable and enable downstream analyses, such as detection of CTS eQTLs in larger sample sets with higher power. Dynamic network analysis tools enhance our ability to identify gene communities that vary over developmental epochs and this variation facilitates inferences that relate cell type and developmental period with risk factors. The research proposed is innovative, in our opinion, because it uses novel statistical methods for integrative an...

Key facts

NIH application ID: 10144504
Project number: 5R01MH123184-02
Recipient: CARNEGIE-MELLON UNIVERSITY
Principal Investigator: KATHRYN M ROEDER
Activity code: R01
Funding institute: NIH
Fiscal year: 2021
Award amount: $505,168
Award type: 5
Project period: 2020-05-01 → 2024-02-29