# Data and analysis ecosystem for eukaryotic pathogen targeted sequencing

> **NIH NIH U01** · UNIVERSITY OF CALIFORNIA, SAN FRANCISCO · 2024 · $524,724

## Abstract

ABSTRACT
Genomic data can provide an invaluable source of information to understand pathogen evolution, identify
patterns of transmission, and characterize phenotypes such as drug resistance and immune escape. For
eukaryotic pathogens, larger genomes, sexual recombination, and complicated transmission dynamics
including polyclonal infections have historically limited the use of genomic data for many of these applications.
However, recent laboratory developments, including multiplexed targeted sequencing, have rapidly increased
the pace of genomic data generation for eukaryotic pathogens. Fundamental differences in the biology and
transmission of infections caused by these pathogens render many of the genomic data and analysis tools
developed for other organisms (primarily humans, viruses, and bacteria) difficult or impossible to use. As a
result, many research efforts have needed to rely on bespoke methods for processing and analysis, limiting the
reusability of data, the accuracy and reproducibility of results, and more generally the productivity of scientists
studying eukaryotic pathogens. There is a need to develop software and computational tools to process, store,
share and analyze these data in a way which sets standards, encourages innovation, and facilities scientific
discovery. We will develop a suite of data standards and robust software including a) bioinformatic pipelines
including tools to facilitate the sharing and storage of these data, b) a modular software toolkit to conduct
downstream statistical analyses relevant for epidemiologic and population genetic research, and c) work within
a community of advisors and experts in the analysis of genomic data for eukaryotic pathogens to develop
approaches that meet the needs of community and encourage broader uptake. The proposed work also
includes harmonizing and developing standards for genomic, epidemiological, and clinical data. We will initially
focus on Plasmodium falciparum, a species of parasite that causes malaria infections, as an organism of direct
application which exhibits key complexities of eukaryotic pathogens (recombination, polyclonal infections). As
such, standards and software developed during this proposal will have relevance beyond this single organism
to other eukaryotic pathogens where similar biological and transmission complexities limit the use of existing
tools to leverage genomic data to answer scientific and public health relevant questions. The expected
outcome of the proposed research is computational software able to analyze targeted amplicon sequencing
data for eukaryotic pathogens. By developing these approaches, with clear engagement within the broader
research community, this work will help change the landscape of analyses possible with complex genomic
data.

## Key facts

- **NIH application ID:** 10948571
- **Project number:** 1U01AI184646-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
- **Principal Investigator:** Bryan R Greenhouse
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $524,724
- **Award type:** 1
- **Project period:** 2024-06-21 → 2025-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10948571

## Citation

> US National Institutes of Health, RePORTER application 10948571, Data and analysis ecosystem for eukaryotic pathogen targeted sequencing (1U01AI184646-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10948571. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
