# GlyGen Supplement: Develop automatic literature mining tool for extracting context specific glycan-protein data that will enhance the extent and quality of data in GlyGen

> **NIH NIH U01** · UNIVERSITY OF GEORGIA · 2020 · $138,658

## Abstract

ABSTRACT With significance in biotechnology, biomedicine, and basic research, glycobiology’s
applications are widespread. Technological advancements in the field of glycobiology have expanded in
parallel with the influx of an array of data within the glycosciences community. The broad range of
experimental approaches, disparate nature of available datasets, and the seemingly piecemeal strategies
required to construct comprehensive interpretations create inherent barriers for glycoscience researchers
to utilize all available information. The mission of GlyGen has been to target and mitigate such challenges
by developing procedures and a platform which integrates or builds upon glycoconjugate structure-function
data from different resources. GlyGen, a NIH-funded international effort, captures and integrates over 90%
of available glycoconjugate data, harmonizing and managing diverse outputs such as glycans, proteins,
and genes integrated with genomics, pathway, and disease information. Since its inception, the GlyGen
team has built a user-friendly platform complete with analytical tools and comprehensive, exportable data
sets to ease the burden for researchers. Within the Swiss Institute of Bioinformatics (SIB), the Proteome
Informatics Group (PIG) has worked extensively to develop the glycoinformatics resource GlyConnect,
which focuses on the molecular characterization of protein glycosylation through an integrated, expertly-
curated platform, specializing in structure analysis and producing novel data sets, such as site-specific
glycan data. Despite each resource’s efforts to mitigate challenges, difficulties in amassing the amalgam
of data required to fully examine microheterogeneity within glycobiology still persist. By utilizing their
distinct strengths, the proposed collaborative research between GlyGen and GlyConnect will focus on
further integrating site-specific protein-glycan data to generate more comprehensive data sets, where
increasing the data availability in GlyGen is expected to accelerate basic and translational research.
Currently, the major resources for site-specific protein-glycan data are UniCarbKB and UniProtKB, though
the amount of available data from these databases, or other similar resources, is not substantial. To
address this limitation, GlyConnect and GlyGen will develop an advanced, scalable, and site-specific
protein-glycan annotation pipeline. This pipeline will be constructed using existing data in GlyConnect, in
addition to roughly 100 publications identified and prioritized through current literature mining efforts in
GlyGen. Moreover, front and back-end software developments will be implemented on the GlyGen
platform, allowing glycoscience researchers to submit site-specific glycan data through a validated
submission system. The proposed research will create a standardized methodology for more efficient data
submission efforts, expand on the available site-specific protein-glycan data for the glycobiology
communi...

## Key facts

- **NIH application ID:** 10154002
- **Project number:** 3U01GM125267-04S1
- **Recipient organization:** UNIVERSITY OF GEORGIA
- **Principal Investigator:** Raja Mazumder
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $138,658
- **Award type:** 3
- **Project period:** 2017-09-01 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10154002

## Citation

> US National Institutes of Health, RePORTER application 10154002, GlyGen Supplement: Develop automatic literature mining tool for extracting context specific glycan-protein data that will enhance the extent and quality of data in GlyGen (3U01GM125267-04S1). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10154002. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
