# Efficient electronic phenotyping using APHRODITE in the Million Veteran Program

> **NIH VA I01** · VETERANS ADMIN PALO ALTO HEALTH CARE SYS · 2021 · —

## Abstract

The Million Veteran Program (MVP) is currently the largest biobank study in the world. The resource provides
an unprecedented opportunity to identify the genetic causes of a variety of human diseases that
disproportionally affect our veterans including diseases that affect the neurological, cardiovascular, pulmonary,
gastrointestinal, endocrine, and musculoskeletal organs. Fast-paced technological progress over the last 10
years now allows us to reliably and densely profile individuals across their entire genome. Such data has
already been generated and linked to a wide spectrum of human diseases and physiologic traits. However,
many more links remain to be made which will provide the scientific community with additional important clues
on the root causes of many life-threatening diseases as well as valuable insights on how to develop new drugs
to treat or prevent these same diseases. The current challenge in making these additional discoveries is no
longer the generation of high quality genetic data in large numbers but rather the organization and querying of
very large and complex electronic health records (EHR) being leveraged by these large biobank studies. Until
now, much effort and time has been expended to painstakingly develop and validate rules-based definitions to
identify individuals with a specific disease, syndrome, or state across a variety of EHR platforms. However, the
recent mapping of the VA corporate data warehouse to the Observational Medical Outcomes Partnership
common data model (OMOP-CDM) provides us with unprecedented opportunities to apply new “electronic
phenotyping” tools that can identify individuals with a specific disease, syndrome, or state in a much more
efficient manner than rules-based methods. The goal of this proposal is to comprehensively test the ability of
one of these new tools named APHRODITE (Automated PHenotype Routine for Observational Definition,
Identification, Training and Evaluation) to identify established genetic links among MVP participants.
APHRODITE was developed at Stanford by one of our co-investigators and uses state of the art machine
learning algorithms to identify individuals with a condition in a fraction of the time it takes to identify them
through rules-based definitions. The algorithm has shown great promise within the Stanford clinical data
warehouse but requires validation in other EHR cohorts. In aim 1, we will test the accuracy of an APHRODITE
classifier to that of a rules-based classifier for at least 5 diseases using gold-standard sets in the VA. In aim 2,
we will test whether APHRODITE classifiers from aim 1 can be applied to MVP participants to replicate
established genetic associations. If automated methods in APHRODITE perform equally well or better than
rules-based methods for multiple diseases, automated methods may be leveraged for phenotypes where rules
based methods may not exist, maximizing the efficiency of genetic discovery in MVP and facilitating rapid
replication ...

## Key facts

- **NIH application ID:** 9955052
- **Project number:** 5I01HX002487-02
- **Recipient organization:** VETERANS ADMIN PALO ALTO HEALTH CARE SYS
- **Principal Investigator:** Themistocles L Assimes
- **Activity code:** I01 (R01, R21, SBIR, etc.)
- **Funding institute:** VA
- **Fiscal year:** 2021
- **Award amount:** —
- **Award type:** 5
- **Project period:** 2019-08-01 → 2021-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9955052

## Citation

> US National Institutes of Health, RePORTER application 9955052, Efficient electronic phenotyping using APHRODITE in the Million Veteran Program (5I01HX002487-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9955052. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
