# Methods For Evolutionary Genomics Analysis

> **NIH NIH R35** · TEMPLE UNIV OF THE COMMONWEALTH · 2024 · $231,674

## Abstract

Summary
The parent R35 research program aims to develop innovative methods and tools for the
comparative analysis of molecular sequences. The focus is on creating machine-learning
methods to perform big data analytics, gaining biological insights, and comparing these with
traditional model-based methods in molecular evolution and phylogenetics. A key development in
this program is the Evolutionary Sparse Learning (ESL) framework, designed to enhance
molecular evolutionary analyses. Although ESL has been benchmarked against classical
methods using high-performance computing (HPC) resources, benchmarking against advanced
deep learning (DL) approaches remains infeasible due to the need for substantial computational
power. To address this, we request a Graphics Processing Unit (GPU) cluster to enable DL
analyses essential for advancing our research. Two major example projects highlight the need for
this system. The first project focuses on discovering fragile clades and causal sequences in
phylogenomics. We have developed metrics for gene-species sequence concordance and clade
probability using ESL models, validated across many phylogenomic datasets. Benchmarking
these ESL methods against DL approaches, such as MSA Transformer, is crucial. MSA
Transformer captures phylogenetic relationships using multiple sequence alignments (MSAs) but
requires refinement for orthologous protein sets, demanding a powerful GPU system. The second
project aims to uncover molecular convergences that parallel organismal convergent evolution.
Using ESL, we have built genetic models to understand the independent origins of traits such as
C4 photosynthesis in grasses and echolocation in mammals. Benchmarking revealed that current
methods, including ESL, are limited in detecting convergences involving different residues at
different sites. Therefore, we are developing ESL approaches leveraging DL-generated protein
embeddings to infer non-identical sequence convergence. Fine-tuning general DL models for
orthologous sequences requires a dedicated GPU cluster, as existing resources are inadequate
for the extensive analyses needed. The requested GPU cluster is essential for refining these DL
models and conducting comprehensive analyses, enhancing the impact and scope of our parent
grant. Our experienced team and institutional support ensure effective use and maintenance of
the equipment, promoting continued advancements in molecular evolutionary analysis.

## Key facts

- **NIH application ID:** 11099368
- **Project number:** 3R35GM139540-04S1
- **Recipient organization:** TEMPLE UNIV OF THE COMMONWEALTH
- **Principal Investigator:** Sudhir Kumar
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $231,674
- **Award type:** 3
- **Project period:** 2021-02-01 → 2026-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11099368

## Citation

> US National Institutes of Health, RePORTER application 11099368, Methods For Evolutionary Genomics Analysis (3R35GM139540-04S1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/11099368. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
