Functional annotation of new genes aided by deep learning

NIH RePORTER · NIH · R35 · $317,418 · view on reporter.nih.gov ↗

Abstract

New genes (NGs) are generated by multiple mechanisms and their end-piece sequences are identified as the chimeric transcript sequence from multiple human sources including healthy and disease tissues. Therefore, NGs have been recognized as important biomarkers and therapeutic targets for precision medicine. Many efforts have been made to study individual NG function and to identify relevant drug targets. However, the current in-depth research and achievements are mainly concentrated on several driver NGs, and classical cancer drugs have been directly used to target the NG domains, such as the kinase domain of BCR-ABL1 fusion protein in leukemia. Some of the fusion proteins with retaining DNA-binding domains such as transcription factors can directly bind their target genes, such as the EWSR1-FLI fusion actively recruiting BAF complex. Recently, the downstream effectors of driver FGs have emerged as therapeutic targets. For example, targeting the downstream CCND2 inhibited RUNX1/ETO-driven leukemic expansion in vitro and in vivo and inhibition of STAT5, the downstream factor of NUP214-ABL1 led to the induction of leukemia cell death. However, the functions of most identified FGs have not been systematically investigated. This is mainly due to the limitations of traditional tools and the high cost of experimental procedures. Therefore, there is an urgent need to develop new tools for analyzing NG breakpoint-specific features systemically in the human genome and predict their originating and regulatory mechanisms, such as upstream and downstream effectors. In-depth annotation based on NG structure is important for understanding the cellular mechanisms of NGs. Effective use of systematic bioinformatics tools for functional annotation can provide a deeper insight into the role of NGs in the development and progression of diseases such as cancers to find direct and indirect therapeutic targets. In this study, we will develop five bioinformatics tools for the functional annotation and feature analysis of NGs, a predictive pipeline for automatic analysis of downstream effects of NGs, and a predictive method for tracing the origin of NGs.

Key facts

NIH application ID
10898043
Project number
5R35GM138184-05
Recipient
UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
Principal Investigator
Pora Kim
Activity code
R35
Funding institute
NIH
Fiscal year
2024
Award amount
$317,418
Award type
5
Project period
2020-09-01 → 2025-06-30