# Statistical Methods for Genome Characterization

> **NIH NIH P01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2023 · $283,461

## Abstract

Project 3: Statistical Methods for Genome Characterization
Abstract
Understanding the role that genes play in life is a key issue in biomedical sciences, yet the
overwhelming majority of sequences in public databases remain uncharacterized. Functional
annotation is important for a variety of downstream analyses of genetic data. Yet experimental
characterization of function remains costly and slow, making computational prediction an
important endeavor. This project therefore proposes three Aims focused on functional genomics.
In our first Specific Aim, we propose to develop a probabilistic evolutionary model built upon
phylogenetic trees and experimental Gene Ontology functional annotations that allows automated
prediction of function for unannotated genes. We will develop a probabilistic hierarchical modeling
framework that that will allow joint inference, and borrowing of strength, across a family of related
trees. We expect this to significantly improve overall accuracy. Our approach will provide a
scalable computational method that will enable gene annotation to be kept up to date regardless
of the flow of new experimental data. Our second Aim focuses on the development of improved
statistical methods for pathway analysis. Such methods aim to detect over-representation of
members of a super-structure, such as a genetic pathway, in a list of objects of interest from an
experimental or statistical analysis. However, pathway definitions are not consistent between
resources, with the overlap between two definitions of the same pathway on differing resources
being as low as 30%. In this Aim we will develop methods that focus on the network structure
itself, which is much more robust. Our third Aim focuses on analysis of epigenetic conservation.
The epigenome dictates cell phenotype and it is increasingly possible to infer which genes are
silenced or expressed by measuring the epigenome of a cell. Cancers are characterized by
multiple genes that show both hypermethylation and hypomethylation relative to normal tissues.
We will develop advanced statistical methods to assess how conservation of DNA methylation
varies along the genome, and validated using measures of ‘essentiality’ taken from the Cancer
Dependency Map and drug sensitivity data taken from the Genomics of Drug Sensitivity in Cancer
(GDSC) Project.

## Key facts

- **NIH application ID:** 10707463
- **Project number:** 5P01CA196569-08
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Paul Marjoram
- **Activity code:** P01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $283,461
- **Award type:** 5
- **Project period:** 2016-07-01 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10707463

## Citation

> US National Institutes of Health, RePORTER application 10707463, Statistical Methods for Genome Characterization (5P01CA196569-08). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10707463. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
