GENOMIC INDEXING OF COMMON FUND DATASETS

NIH RePORTER · NIH · OT2 · $605,261 · view on reporter.nih.gov ↗

Abstract

The accessibility of information relevant for interpreting genetic variants. As the pace of WGS accelerates, the interpretation of the 99% of variants revealed by WGS that are non-coding is of increasing importance. High-volume “omic” profiling datasets generated by Common Fund projects and derived information about specific variants and regulatory elements have great potential to inform the interpretation of such non-coding variants. NIH CF Roadmap Epigenome, GTEx and CF Extracellular RNA Communication Consortium (CF ERCC) already generate publicly shareable information about the impact of individual regulatory variants in the form of allele-specific DNA methylation, chromatin states, transcription factor binding, phased allelic transcription, e/sQTL information, and transcription status of non-coding and coding variants. More detailed information that may help evaluate functional effects of a variant in specific haplotype and tissue contexts can be gleaned from privacy-protected multi-omic profiles of biosamples containing the variant. Both the summary-level and the protected variant information is currently fragmented, poorly accessible and lacks interoperability. The scope of this project. By leveraging resources from the CF ERCC and NHGRI ClinGen projects (Table 1), we will unlock the value of variant information across Common Fund projects. One immediate aim will be to improve FAIRness of Extracellular RNA Communication Consortium (ERCC) data and make it accessible via the CFDE Data Portal. Additional two aims cut across CF projects and involve the development of infrastructure for “genomic indexing”, i.e., for aggregating variant information across CF projects and for making the information FAIR and useful in both the research and clinical use cases. The information will be aggregated at the summary-level that does not compromise identity of participants and also at the more detailed level that will be protected and will involve authorized access. We will initiate the cross-CF integration with variant information from Roadmap Epigenome, GTEx, and the Extracellular RNA Communication Consortium (ERCC) projects. In collaboration with GTEx (see letter from Dr. Kristin Ardlie, GTEx PI), we will make the aggregated variant information accessible via web UIs and APIs hosted at the CFDE Data Portal. The research and clinical diagnostic use cases. The research use cases will come from the NIH Common Fund Gabriella Miller Kids First-KOMP2 (see letters from Dr. Bruce Gelb, KidsFirst-KOMP2 PI, and Dr. Sharon Plon, KidsFirst PI). The clinical diagnostic use cases will come from the NHGRI Clinical Genome Resource (ClinGen) project (see letter from Dr. Sharon Plon, ClinGen PI). In collaboration with these projects, we will develop data models, data flows, system designs, and will participate in the validation of the systems. This collaboration between Drs. Gelb, Plon and Milosavljevic will build on several years of their previous collaboration within Cli...

Key facts

NIH application ID: 10468528
Project number: 3OT2OD030547-01S1
Recipient: BAYLOR COLLEGE OF MEDICINE
Principal Investigator: Aleksandar Milosavljevic
Activity code: OT2
Funding institute: NIH
Fiscal year: 2021
Award amount: $605,261
Award type: 3
Project period: 2020-09-24 → 2023-09-23