Delivering FAIR Datasets for the Neglected Parasite Trichomonas vaginalis and Studies in Comparative Genomics

NIH RePORTER · NIH · R21 · $237,750 · view on reporter.nih.gov ↗

Abstract

Project Summary Trichomonas vaginalis is a flagellated, anaerobic protist that causes trichomoniasis, the most common non-viral sexually transmitted disease in humans, with prevalence in the USA estimated at 13% for non- Hispanic Black woman, and 5% in women aged 15-49 globally. Trichomoniasis can induce painful genital tract inflammation and discharge, increase the risk of HIV acquisition and transmission, and have pregnancy sequelae that include preterm delivery and low birth weight. The Centers for Disease Control and Prevention has identified trichomoniasis as a neglected parasitic infection and targeted it as a priority for public health action. In addition to T. vaginalis, virtually all known trichomonads are parasites or commensals of vertebrates, and include other human-infecting species (Trichomonas tenax, Pentatrichomonas hominis), devastating pathogens of birds (Trichomonas gallinae, Trichomonas stableri), an economically important pathogen of cattle (Tritrichomonas foetus), and pathogens of pets (Tr. foetus, P. hominis). Alarmingly, the host ranges of some trichomonads suggest that they can be agents of disease transmitted between humans and animals (i.e., zoonotic). Research on trichomonads remains neglected; genomic studies to date are primarily confined to T. vaginalis, a draft assembly of which (TvG3_2007) was published by our group in 2007 and deposited in public databases, including the NIH-funded Bioinformatics Resource Center (BRC) TrichDB, part of the EuPathDB suite of eukaryotic pathogen databases. While TvG3_2007 was groundbreaking and fruitful for research into the basic biology of T. vaginalis, it is highly fragmented due to the enormous complement of high copy number sequences, particularly long transposable elements (TEs), as well as expanded gene families. Uncertainty about gene and TE copy numbers due to assembly fragmentation is compounded by the enormous number (>70%) of the ~50,000 predicted protein-coding genes that could not be annotated beyond `hypothetical' or `conserved hypothetical' status. TvG3_2007 remains the only trichomonad genome deposited in TrichDB, and only its TE annotation has been updated since then. All of these factors are obstacles to studying T. vaginalis and other trichomonad pathogens at genomic, evolutionary, and molecular levels. This proposal seeks to remedy this situation by using modern databases and tools, including those available from the EuPathDB BRC, to annotate a new, high-quality, long-read T. vaginalis assembly that we have recently generated, and transfer that annotation to 17 assemblies that we and colleagues have also generated from eight trichomonad species. All data will be deposited in TrichDB, massively expanding the genomic assets available for T. vaginalis and trichomonad research. We will subsequently use the improved TrichDB resource to conduct comparative genomics across the trichomonads to elucidate differences in genome characteristics, gene family expansion, and ...

Key facts

NIH application ID
9877317
Project number
1R21AI149449-01
Recipient
NEW YORK UNIVERSITY
Principal Investigator
JANE M CARLTON
Activity code
R21
Funding institute
NIH
Fiscal year
2020
Award amount
$237,750
Award type
1
Project period
2020-01-17 → 2021-12-31