# GHUCCTS N3C COVID data mapping

> **NIH NIH UL1** · GEORGETOWN UNIVERSITY · 2021 · $99,864

## Abstract

Abstract
A major challenge to full utilization the available data and resources has been the complex nature of health
data, and heterogeneity of data sources (including unstructured clinical notes) combined with a lack of
standards. The lack of standards precludes semantic interoperability across platforms and between institutions.
Instead, current approaches utilize resource intensive natural language processes to extract, transform, and
correlate data from different sources for analysis. To improve translational science and accelerate research to
improve patient outcomes, many new and innovative studies are leveraging large volumes of available data
through standardized and shared data initiatives. With current advances in computing and health data analysis
tools, methods and access, and to make data more meaningful, open, and accessible, research studies have
moved beyond traditional retroactive reporting to pragmatic interventions and predictive capabilities. Ongoing
efforts focus on exploiting common data standards and models such as the Observational Medical Outcomes
Partnership (OMOP) standard—deﬁned by the Observational Health Data Sciences and Informatics (OHDSI)
consortium, and accepted as canon by both the NIH and PCORI— will lead the way to discover insights in
textual narrative, enforce data standardization, and promote scalability and sharing. The OHDSI Common Data
Models (CDM) makes data more meaningful, open, and accessible, which drives translational science and
allows for consistent development of predictive models across different data sources. The National COVID
Cohort Collaborative (N3C), ACT, BD2K-NIH Data Commons, the National Center for Data to Health (CD2H),
and others are among the efforts that will lead to new discoveries and informed decision making, driven by
data science and undergirded by mature Big Data technologies. We propose to design and establish novel,
scalable, and standardized big data processes to massively abstract the raw electronic medical record
datasets for observational studies. This project will develop a secure cloud-based environment to host these
data, as well as the application programming and graphical user interfaces to support observational research
studies leveraging these resources. By these means we will reduce the barriers to data standardization,
annotation and sharing for reproducible analytics and begin to enforce complete semantic and syntactic
interoperability between the resources in the data ecosystem. This effort will enable our investigators to study
the effects of medical interventions and predict patients' health outcomes and generate the empirical evidence
base necessary to establish best practices in observational analysis.

## Key facts

- **NIH application ID:** 10299876
- **Project number:** 3UL1TR001409-06S3
- **Recipient organization:** GEORGETOWN UNIVERSITY
- **Principal Investigator:** Nawar Shara
- **Activity code:** UL1 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $99,864
- **Award type:** 3
- **Project period:** 2015-08-28 → 2022-01-25

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10299876

## Citation

> US National Institutes of Health, RePORTER application 10299876, GHUCCTS N3C COVID data mapping (3UL1TR001409-06S3). Retrieved via AI Analytics 2026-05-29 from https://api.ai-analytics.org/grant/nih/10299876. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*