# Delivering FAIR Datasets for the Neglected Parasite Trichomonas vaginalis and Studies in Comparative Genomics

> **NIH NIH R21** · NEW YORK UNIVERSITY · 2020 · $237,750

## Abstract

Project Summary
 Trichomonas vaginalis is a flagellated, anaerobic protist that causes trichomoniasis, the most common
non-viral sexually transmitted disease in humans, with prevalence in the USA estimated at 13% for non-
Hispanic Black woman, and 5% in women aged 15-49 globally. Trichomoniasis can induce painful genital tract
inflammation and discharge, increase the risk of HIV acquisition and transmission, and have pregnancy
sequelae that include preterm delivery and low birth weight. The Centers for Disease Control and Prevention
has identified trichomoniasis as a neglected parasitic infection and targeted it as a priority for public health
action. In addition to T. vaginalis, virtually all known trichomonads are parasites or commensals of vertebrates,
and include other human-infecting species (Trichomonas tenax, Pentatrichomonas hominis), devastating
pathogens of birds (Trichomonas gallinae, Trichomonas stableri), an economically important pathogen of cattle
(Tritrichomonas foetus), and pathogens of pets (Tr. foetus, P. hominis). Alarmingly, the host ranges of some
trichomonads suggest that they can be agents of disease transmitted between humans and animals (i.e.,
zoonotic). Research on trichomonads remains neglected; genomic studies to date are primarily confined to T.
vaginalis, a draft assembly of which (TvG3_2007) was published by our group in 2007 and deposited in public
databases, including the NIH-funded Bioinformatics Resource Center (BRC) TrichDB, part of the EuPathDB
suite of eukaryotic pathogen databases. While TvG3_2007 was groundbreaking and fruitful for research into
the basic biology of T. vaginalis, it is highly fragmented due to the enormous complement of high copy number
sequences, particularly long transposable elements (TEs), as well as expanded gene families. Uncertainty
about gene and TE copy numbers due to assembly fragmentation is compounded by the enormous number
(>70%) of the ~50,000 predicted protein-coding genes that could not be annotated beyond `hypothetical' or
`conserved hypothetical' status. TvG3_2007 remains the only trichomonad genome deposited in TrichDB, and
only its TE annotation has been updated since then. All of these factors are obstacles to studying T. vaginalis
and other trichomonad pathogens at genomic, evolutionary, and molecular levels. This proposal seeks to
remedy this situation by using modern databases and tools, including those available from the EuPathDB BRC,
to annotate a new, high-quality, long-read T. vaginalis assembly that we have recently generated, and transfer
that annotation to 17 assemblies that we and colleagues have also generated from eight trichomonad species.
All data will be deposited in TrichDB, massively expanding the genomic assets available for T. vaginalis and
trichomonad research. We will subsequently use the improved TrichDB resource to conduct comparative
genomics across the trichomonads to elucidate differences in genome characteristics, gene family expansion,
and ...

## Key facts

- **NIH application ID:** 9877317
- **Project number:** 1R21AI149449-01
- **Recipient organization:** NEW YORK UNIVERSITY
- **Principal Investigator:** JANE M CARLTON
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $237,750
- **Award type:** 1
- **Project period:** 2020-01-17 → 2021-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9877317

## Citation

> US National Institutes of Health, RePORTER application 9877317, Delivering FAIR Datasets for the Neglected Parasite Trichomonas vaginalis and Studies in Comparative Genomics (1R21AI149449-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/9877317. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
