# HIGH THROUGHPUT GENOTYPING AND DNA SEQUENCING FOR STUDYING THE GENETIC CONTRIBUTIONS TO HUMAN HEALTH AND DISEASE -WHOLE GENOME SEQUENCING (30X) FOR NHGRI (QUINLAN)

> **NIH NIH N01** · JOHNS HOPKINS UNIVERSITY · 2024 · $182,700

## Abstract

The original Centre d’Etude du Polymorphisme Humain (CEPH) genomes have been used for over thirty years
worldwide to map the human genome, understand genome biology, and identify genes associated with traits
and disease ((1), cited 489 times). They are part of the HapMap project, 1000 Genomes Project, and the
National Institute of Standards and Technology. The majority of these genomes came from 44 large threegeneration
Utah families composed of 4 grandparents, 2 parents and 7 to 17 children. We have generated
whole genome sequence (WGS) from 603 of the original Utah CEPH blood samples, which is available to the
research community. This resource continues to be used to understand recombination, human variation, and
genome mutations in the germline and soma; to benchmark new bioinformatics tools; and to serve as wellcharacterized
controls in genetic discovery. Because germline de novo genomic changes identified in the 2nd
generation can be observed in the multiple offspring of the 3rd generation, false positive and somatic genetic
changes can be distinguished from germline mutations to establish a “truth” set of de novo variation. Many
members of the 3rd generation now have adult offspring (generation 4). This project proposes to build upon this
important resource by conducting WGS on DNA from 300 of these newly contacted research participants from
the 4th generation. By adding these 300 genomes to the existing resource, the parent-to-child transmission
events will nearly double the observations for an expanded “truth” set of genetic variation. This will be a
powerful resource to benchmark new tools, further our understanding of recombination and mutation events
that lead to disease, and distinguish disease-causing events from normal variation. The project has two aims.
First, it will provide aligned reads (CRAM format) and variant calls (in VCF format) of the genomes through
dbGaP/AnVIL. Phenotype information to accompany these genomes is available through the University of
Utah. Second, the project will discover and characterize multiple forms of genomic variation using the fourgeneration
Utah CEPH genome sequence resource. Using best practice methods, we will discover SNV,
INDEL, STR, and structural variants among all 903 (603 from generations 1-3 and 300 from generation 4)
CEPH genomes. The resulting variant calls will be annotated based on whether they appear to be de novo
mutations, segregating variants, or false positives.

## Key facts

- **NIH application ID:** 11216054
- **Project number:** 75N92024D00013-0-759202400009-1
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** KIM DOHENY
- **Activity code:** N01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $182,700
- **Award type:** —
- **Project period:** 2024-09-24 → 2027-09-08

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11216054

## Citation

> US National Institutes of Health, RePORTER application 11216054, HIGH THROUGHPUT GENOTYPING AND DNA SEQUENCING FOR STUDYING THE GENETIC CONTRIBUTIONS TO HUMAN HEALTH AND DISEASE -WHOLE GENOME SEQUENCING (30X) FOR NHGRI (QUINLAN) (75N92024D00013-0-759202400009-1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/11216054. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
