# Collaboration Capacity: A Framework for Measuring Data-Intensive Biomedical Research

> **NIH NIH R01** · SYRACUSE UNIVERSITY · 2020 · $198,211

## Abstract

The goal of this proposed project is to develop a collaboration capacity framework and evaluate the
collaboration capacity of science teams at macro-, meso-, and micro-levels through using GenBank metadata
and other related data sources. The framework defines the Scientifc &Technical (S&T) human capital,
cyberinfrastructure, and science policy as the enablers of collaboration capacity, the impact of which on
collaboration capacity can be measured by data production and data-to-knowledge metrics such as team size
and ratio of data to publications. GenBank metadata as the primary data source for this project offers a
longitudinal coverage (1984-2018) and full research lifecycle traces from data production to publication to
patent application, creating an unprecedented opportunity to study the biomedical research enterprise. This
project will design and create datasets from GenBank metadata to generate analysis-ready data, which will be
combined with statistics from NSF and NIH. The datasets will be used to develop computational models and
test hypotheses that examine the correlation between collaboration capacity, team size, and connectedness
of nodes, as well as the properties of disruptive nodes and their impact on productivity and innovation. In
addition to statistics from NSF and NIH, the project will also combine events in science policy (e.g., mandates
on data sharing), public health (e.g., outbreaks and prevalent chronic diseases), and funding to triangulate
with the datasets and analyze collaboration capacity and policy implications. The data source and theoretical
approach compensate for the limitations of publication-centric data sources used in past research on
collaboration networks. The fact that the primary data source comes from basic biomedical research situates
this study at the cutting-edge and allows us to gain more holistic insights into the impact of federal investment
and policy on collaboration capacity. Our future research will use this longitudinal, rich data collection to
continue deeper mining of collaboration in data production and data-to-knowledge lifecycle, particularly in
relation to specific genes, diseases, and treatments that are key aspects in basic and clinical biomedical
research.

## Key facts

- **NIH application ID:** 9981992
- **Project number:** 1R01GM137409-01
- **Recipient organization:** SYRACUSE UNIVERSITY
- **Principal Investigator:** Jeff Hemsley
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $198,211
- **Award type:** 1
- **Project period:** 2020-01-01 → 2022-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9981992

## Citation

> US National Institutes of Health, RePORTER application 9981992, Collaboration Capacity: A Framework for Measuring Data-Intensive Biomedical Research (1R01GM137409-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9981992. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
