Integrative analysis of multi-omics data to identify and characterize long noncoding RNA-derived fusions in pediatric cancer

NIH RePORTER · NIH · R03 · $334,926 · view on reporter.nih.gov ↗

Abstract

Cancer remains the leading cause of death by disease past infancy among children in the United States. In contrast to adult cancer with many genetic mutations, most pediatric cancers have few genetic mutations. Instead, recent studies have shown that fusion RNAs and their encoded proteins may drive tumorigenesis in children. Fusion RNAs are generated by exons from two genes. With the launch of the Fusion Oncoproteins in Childhood Cancers Consortium, more fusion proteins are being found and studied. However, a complete understanding of the mechanisms in pediatric cancer remains elusive, mainly due to three unsolved challenges. First, current studies have focused on mRNA-derived fusion proteins and have not explored long noncoding RNA-derived fusion transcripts (lnc-fusions) and their encoded proteins in pediatric cancer; although lnc-fusions have been reported in adult cancer to regulate anti-tumor immunity. long noncoding RNAs (lncRNAs) are long transcripts of at least 200 nucleotides that cannot encode protein. lncRNAs largely outnumber mRNAs and play critical roles in various cancers. Therefore, a complete investigation of mechanisms driving pediatric cancer is not possible without expanding the study of fusion proteins to include lncRNA-fusions. Second, existing lnc-fusion detection methods cannot explore lnc-fusions that are derived from novel lncRNAs. Due to high disease-specificity, most lncRNAs have not been annotated in pediatric cancer. Third, fusion RNAs, including lnc-fusions, may be formed by alternative mechanisms, such as chromosome rearrangement or aberrant splicing events. These alternative mechanisms complicate the understanding of genetic mechanisms and thus treatment. The large amount of multi-omics data from various Common Fund sources enables us to address these challenges in pediatric cancer. Previously, we had developed computational methods to identify and characterize lncRNAs for human diseases and development. To discover molecular drivers in pediatric cancer, we will extend our previous studies of lncRNAs to identify lnc-fusions from RNA sequencing data (Aim 1). We will further determine the potential functions and formation mechanisms of lnc-fusions using integrative methods (Aim 2). Machine learning algorithms will be used to identify lnc-fusions as putative biomarkers and prognostic biomarkers in pediatric cancers. This study will focus on neuroblastoma and myeloid malignancies since these pediatric cancers have large cohorts of RNA sequencing and whole-genome sequencing data in the Gabriella Miller Kids First Dataset. In summary, we will discover lnc-fusions in pediatric cancers, develop computational methods and frameworks broadly applicable to existing and future RNA sequencing datasets. This study will improve the utility of three selected Common Fund datasets (Kids First, GTEx and 4DNucleome), and two external databases (GEO and TCGA).

Key facts

NIH application ID
10577314
Project number
1R03OD034498-01
Recipient
UNIV OF MASSACHUSETTS MED SCH WORCESTER
Principal Investigator
Chan Zhou
Activity code
R03
Funding institute
NIH
Fiscal year
2022
Award amount
$334,926
Award type
1
Project period
2022-09-20 → 2024-09-19