Finishing multiple genomes in EupathDB using Oxford Nanopore Single Molecule sequencing

NIH RePORTER · NIH · R21 · $197,918 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT: Toxoplasma gondii is an important opportunistic pathogen of humans where it can cause severe disease in the developing fetus and those with HIV/AIDS. Despite extensive efforts by the research community to sequence, assemble and annotate multiple genomes for this organism, these genome sequences remain incomplete due to repetitive and uncloneable sequence. A major reason for this knowledge gap is that the sequencing technologies used (1st and 2nd generation) cannot fully resolve these loci. This prevents fully effective use of the data (which is hosted on the EuPathDB Bioinformatics Resource Center; BRC) by the research community since there are thousands of base pairs of missing and/or unassembled data. Here we propose to resequence and generate de novo assemblies for multiple T. gondii isolates (as well as two other species that serve as comparators) using 3rd generation sequencing and Chromosome conformation-based sequencing approaches, and then annotate them and integrate them into EuPathDB BRC. Our preliminary data show the feasibility of this approach where we have used it to revise the karyotype for T. gondii (discovering that it harbors 13, rather than 14, chromosomes), increase the total genome assembly by ~2 Mb, and perform genome-wide analyses of structural and/or copy number variation at loci with a known role in T. gondii pathogenesis. The proposed studies are responsive to RFA PA-19-068, “Secondary Analysis of Existing Datasets for Advancing Infectious Disease Research” by specifically using data outside of the EuPathDB BRC (our de novo assemblies and annotations) to improve the utility of data within the EuPathDB BRC (gene expression, annotation and proteomics data, for example). Moreover the analysis pipeline will rely on using the existing genome sequence data within the EuPathDB BRC to identify sequence differences between our new assemblies and those hosted by the BRC. In addition to the expertise of the PI in genome sequencing and function of multicopy loci encoding pathogenesis determinants, the success of the proposed studies is also facilitated by the assembled team, including an expert in Chromosome Conformation Capture-based sequencing approaches (Le Roch) and sequence assembly and annotation (Lorenzi).

Key facts

NIH application ID
10188420
Project number
5R21AI154386-02
Recipient
UNIVERSITY OF PITTSBURGH AT PITTSBURGH
Principal Investigator
Jon P Boyle
Activity code
R21
Funding institute
NIH
Fiscal year
2021
Award amount
$197,918
Award type
5
Project period
2020-06-10 → 2023-05-31