# Accelerating Data and Metadata Standards in the Environmental Health Sciences Study of Emerging Water Contaminants

> **NIH NIH R24** · YALE UNIVERSITY · 2024 · $621,605

## Abstract

Project Summary
Water contamination is one of the biggest public health concerns of the day. Chemical contamination of
drinking water can lead to a wide range of chronic adverse health impacts including cancer and developmental,
neurological, and reproductive damages. Populations worldwide are exposed to a myriad of chemicals, which
have been recently classified as emerging contaminants (EC), via drinking water. These ECs originate from
personal care products, pesticides, plastics, and a numerous array of emissions to the environment. Only a
handful of ECs have been extensively evaluated regarding human exposures and health impacts. There is a
paucity of knowledge on emergent water contaminants in terms of their impact on human health. Data-driven
environmental health sciences (EHS) research brings hope to fill this knowledge gap. However, this hope will
not be completely fulfilled if the data is not FAIR (findable, accessible, interoperable, and reusable). Without
FAIR data, it would be very challenging to integrate diverse types of exposure related data that are
heterogenous in format and structure and are difficult to find. To make data FAIR to enable integrative
exposure studies, it involves the following objectives: i) open development, extension, adoption, and
refinement of data and metadata standards, ii) software tools to implement standards, and iii) engagement with
the stakeholders across different communities. This proposal leverages scientific use cases to engage with the
EHS and data science communities to achieve these objectives. It assembles a multidisciplinary team of
biomedical researchers, environmental science and engineering experts, and data scientists. The proposed
use cases represent complementary types of EC exposure studies. We will utilize these use cases as a
foundation to develop strategies to tackle the complex data integration challenge. It entails the following
specific aims.
1. Creating rich machine-readable metadata as part of developing a minimum information standard for
 environmental exposure assessment.
2. Annotating, mapping, and extracting data with the use of ontologies and common data elements (CDEs)
3. Harmonizing exposure related data with a graph model to build an environmental exposure knowledge
graph.
4. Engaging the user community through expert panels, workshops, social networking, and NIEHS-sponsored
meetings.
5. Evaluating the impact of the proposed project using appropriate metrics including user surveys,
 assessment of data FAIRness, usability, and NLP evaluation metrics such as accuracy, precision, recall,
 and F-measures.

## Key facts

- **NIH application ID:** 10840122
- **Project number:** 1R24ES036135-01
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** KEI-HOI CHEUNG
- **Activity code:** R24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $621,605
- **Award type:** 1
- **Project period:** 2024-05-15 → 2029-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10840122

## Citation

> US National Institutes of Health, RePORTER application 10840122, Accelerating Data and Metadata Standards in the Environmental Health Sciences Study of Emerging Water Contaminants (1R24ES036135-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10840122. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
