# Automating Systematic Reviews of Environmental Health Literature with Machine Learning

> **NIH NIH R43** · IDOC SOFTWARE, INC. · 2022 · $255,961

## Abstract

Project Summary
IDOC Software proposes the development of Artificial Intelligence (AI) algorithms, together with user-friendly
software, for facilitating the efficient production of Systematic Reviews (SRs) in the field of Environmental
Health (EH). SR is the “Gold Standard” for assessing evidence to be used for decision making in a variety of
health contexts, including health care, public health and environmental health.
SRs synthesize evidence from studies that meet eligibility criteria based on the decision being made (such as
hazard identification or risk assessment). All relevant studies need to be considered in an SR, meaning that all
of the potentially related articles must be evaluated one by one. For example, if the SR question relates only to
40-65 year old women, then studies containing men or containing women outside this age range must be
excluded from the final set of articles used to draw a conclusion. The time (and expense) involved in screening
potentially thousands of citations is substantial, often taking a team of screeners months to complete. This
severely limits the numbers of SRs that can be conducted and threatens timely decisions by policy makers.
AI has tremendous potential to accelerate the conduct of SRs by automatically recognizing words that relate to
eligibility criteria, however there are significant challenges. In the field of EH the same study populations,
exposures, and health outcomes can be described with many different combinations of words and phrases. It
is difficult for AI algorithms to generalize language in the way needed to overcome the complexity inherent in
these scientific communications.
IDOC Software has developed algorithms capable of deducing connections between words and phrases.
These learned connections are formed around a EH framework, or ontology, known as PECO: Population,
Exposure, Comparator, and Outcome. The software maps key words and phrases in an article onto these
categories and then highlights these terms in the article text via color-coding. A screener then need not read an
entire article to determine if it meets the eligibility criteria. Instead, the screener scans the “P” colored words to
determine if the population studied meets the “P” inclusion criteria. Then the “E” colored words can be
evaluated, and so on. This accelerates the rate at which a screener can evaluate articles manyfold.
The challenge for the AI algorithms is to then find all the PECO words and phrases and accurately categorize
them. High accuracy requires taking into account causal and other relationships between the words and
phrases. Advances in machine learning and natural language processing achieved in Phase I on article titles
and abstracts, and then on the full text of articles in Phase II, will result in more efficient conduct of SRs,
reducing costs and time, and thereby furthering the goal of making timely evidence-informed decisions and
policy to protect public health from unsafe environmental exposur...

## Key facts

- **NIH application ID:** 10378843
- **Project number:** 1R43ES033854-01
- **Recipient organization:** IDOC SOFTWARE, INC.
- **Principal Investigator:** Eitan Agai
- **Activity code:** R43 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $255,961
- **Award type:** 1
- **Project period:** 2022-02-01 → 2023-10-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10378843

## Citation

> US National Institutes of Health, RePORTER application 10378843, Automating Systematic Reviews of Environmental Health Literature with Machine Learning (1R43ES033854-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10378843. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
