Identification of Transposable Element Insertions in the Kids First Data

NIH RePORTER · NIH · R03 · $169,041 · view on reporter.nih.gov ↗

Abstract

Project Summary Insertion of transposable elements (TEs, sometimes referred to as “jumping genes”) into the human genome can be pathogenic. Our aim in this project is to use sophisticated computational approaches to characterize TE insertions in the whole-genome sequencing data generated in the Gabriella Miller Kids First Pediatric Research Program and identify any insertional mutations that may disrupt gene function. The large scale of the Kids First program provides an unprecedented opportunity to investigate the role of TE insertions in childhood cancers and structural birth defects, as well as to create a resource of reference TE maps that will be important for all other TE studies. We will first modify our existing algorithm called xTEA for the trio design of the Kids First studies and increase the accuracy and efficiency of the algorithm. Then, we will apply it to the thousands of trios that have been profiled in the Kids First program, using a pipeline optimized for the cloud environment. The resulting set of TE insertions (especially L1, Alu, SVA, and HERV insertions) will be curated with all relevant features and be made into a database for the community. We will also apply machine learning methods to improve the calls once a sufficient amount of training data have been obtained. To investigate the potential pathogenicity of the mutation, we will first focus on insertions within genes, but we will also explore those in regulatory elements inferred from epigenetic profiling data.

Key facts

NIH application ID
9957262
Project number
1R03CA249364-01
Recipient
HARVARD MEDICAL SCHOOL
Principal Investigator
Peter J Park
Activity code
R03
Funding institute
NIH
Fiscal year
2020
Award amount
$169,041
Award type
1
Project period
2020-06-01 → 2022-05-31