# A Structure-based orthology approach to predict protein function in eukaryotic parasites

> **NIH NIH R21** · SEATTLE CHILDREN'S HOSPITAL · 2024 · $293,550

## Abstract

PROJECT SUMMARY
Eukaryotic parasites are a diverse group of organisms that can cause a wide range of infectious diseases in
humans. These diseases have a significant impact on global health, with millions of people affected
annually. The advent of affordable genome sequencing has revolutionized the study of pathogens for
therapeutic development. However, the functional annotation of the proteins encoded by these genomes has
struggled to keep pace with the rapid advancements in sequencing technology. Traditional methods based on
sequence orthology have failed to fully annotate a third of all proteins in VEupathDB, a sequence database
dedicated to eukaryotic parasites and a Bioinformatics Resource Centers (BRCs) project funded by the
National Institute of Allergy and Infectious Diseases (NIAID).
To overcome this challenge, we propose to apply a novel structure-based orthology approach to predict protein
function. This approach relies on AlphaFoldDB, a vast repository of precomputed models, and Foldseek, a
revolutionary algorithm that aligns structures with high accuracy and speed, combined with OrthoMCL
sequence orthology. The successful completion of this project has the potential to be a game-changer in
infectious disease research. The ability to functionally annotate thousands of uncharacterized proteins in
VEupathDB will provide a valuable tool for identifying potential targets for further functional studies and
therapeutic development. Furthermore, the ability to automate this process represents a significant
advancement over the current manual process, which requires extensive structural biology knowledge.
Specific Aims: 1) Define Domain-based Structure Orthology Groups (DSOGs) at scale by leveraging
AlphaFoldDB and Foldseek to identify structural orthologs and rank them based on conservation of positions in
structure-based sequence alignments. 2) Predict the function of DSOGs at scale with natural language
processing techniques, such as ProtNLM, to generate unified names and functional annotations for proteins
with similar functions within DSOGs. We will collaborate with VEupathDB to update existing official product
names and annotations.
By successfully completing these specific aims, our project will deliver annotations for thousands of
uncharacterized proteins in VEupathDB as well as an automated pipeline that streamlines the laborious
process of protein functional annotation. The annotations will provide valuable insights into the biology of
parasitic organisms, help identify potential targets for further functional studies, and facilitate the development
of novel therapeutic interventions for infectious diseases.

## Key facts

- **NIH application ID:** 10868084
- **Project number:** 1R21AI182872-01
- **Recipient organization:** SEATTLE CHILDREN'S HOSPITAL
- **Principal Investigator:** Isabelle Phan
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $293,550
- **Award type:** 1
- **Project period:** 2024-06-11 → 2026-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10868084

## Citation

> US National Institutes of Health, RePORTER application 10868084, A Structure-based orthology approach to predict protein function in eukaryotic parasites (1R21AI182872-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10868084. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
