# GENOMIC INDEXING OF COMMON FUND DATASETS

> **NIH NIH OT2** · BAYLOR COLLEGE OF MEDICINE · 2021 · $605,261

## Abstract

The accessibility of information relevant for interpreting genetic variants. As the pace of WGS
accelerates, the interpretation of the 99% of variants revealed by WGS that are non-coding is of
increasing importance. High-volume “omic” profiling datasets generated by Common Fund projects and
derived information about specific variants and regulatory elements have great potential to inform the
interpretation of such non-coding variants. NIH CF Roadmap Epigenome, GTEx and CF Extracellular
RNA Communication Consortium (CF ERCC) already generate publicly shareable information about the
impact of individual regulatory variants in the form of allele-specific DNA methylation, chromatin states,
transcription factor binding, phased allelic transcription, e/sQTL information, and transcription status of
non-coding and coding variants. More detailed information that may help evaluate functional effects of a
variant in specific haplotype and tissue contexts can be gleaned from privacy-protected multi-omic
profiles of biosamples containing the variant. Both the summary-level and the protected variant
information is currently fragmented, poorly accessible and lacks interoperability.
The scope of this project. By
leveraging resources from the CF ERCC
and NHGRI ClinGen projects (Table 1),
we will unlock the value of variant
information across Common Fund
projects. One immediate aim will be to
improve FAIRness of Extracellular RNA
Communication Consortium (ERCC)
data and make it accessible via the
CFDE Data Portal. Additional two aims
cut across CF projects and involve the
development of infrastructure for
“genomic indexing”, i.e., for aggregating
variant information across CF projects
and for making the information FAIR and useful in both the research and clinical use cases. The
information will be aggregated at the summary-level that does not compromise identity of participants and
also at the more detailed level that will be protected and will involve authorized access. We will initiate the
cross-CF integration with variant information from Roadmap Epigenome, GTEx, and the Extracellular
RNA Communication Consortium (ERCC) projects. In collaboration with GTEx (see letter from Dr. Kristin
Ardlie, GTEx PI), we will make the aggregated variant information accessible via web UIs and APIs
hosted at the CFDE Data Portal.
The research and clinical diagnostic use cases. The research use cases will come from the NIH
Common Fund Gabriella Miller Kids First-KOMP2 (see letters from Dr. Bruce Gelb, KidsFirst-KOMP2 PI,
and Dr. Sharon Plon, KidsFirst PI). The clinical diagnostic use cases will come from the NHGRI Clinical
Genome Resource (ClinGen) project (see letter from Dr. Sharon Plon, ClinGen PI). In collaboration with
these projects, we will develop data models, data flows, system designs, and will participate in the
validation of the systems. This collaboration between Drs. Gelb, Plon and Milosavljevic will build on
several years of their previous collaboration within Cli...

## Key facts

- **NIH application ID:** 10468528
- **Project number:** 3OT2OD030547-01S1
- **Recipient organization:** BAYLOR COLLEGE OF MEDICINE
- **Principal Investigator:** Aleksandar Milosavljevic
- **Activity code:** OT2 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $605,261
- **Award type:** 3
- **Project period:** 2020-09-24 → 2023-09-23

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10468528

## Citation

> US National Institutes of Health, RePORTER application 10468528, GENOMIC INDEXING OF COMMON FUND DATASETS (3OT2OD030547-01S1). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10468528. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*