# Guiding humans to create better labeled datasets for machine learning in biomedical research

> **NIH NIH R01** · NORTHWESTERN UNIVERSITY · 2022 · $332,168

## Abstract

ABSTRACT
The whole-slide images used in digital and computational pathology are stored in a tiled pyramidal format to
support smooth visualization. While there are a number of tools to read these images, there is a lack of
adequate tools available for easy and fast conversion of data to these tiled pyramidal formats. This limits the
ability of investigators who generate image analysis or other visualizations from viewing these using pathology
software tools. This has resulted in a disconnect between pathology software tools and general-purpose
software tools for data and image analysis like Numpy. In this proposal we will create optimized and easy-to-
use open-source programming interfaces that allow generation of tiled pyramidal images from a variety of
popular array and vector data formats. This will allow users to create arbitrarily large tiled pyramidal images
from Numpy, Zarr, and Dask arrays, and vector formats like Scalable Vector Graphics. Firstly, we will generate
and document a modular and general-purpose tiling interface for use in python. Second, we will implement
support for the most popular input and output formats. Third, we will focus on software engineering to ensure
that the software is maintainable and extensible by the research community. This includes documentation of
code and examples for use, implementing testing and code review, and packaging for package managers and
cloud-readiness. Altogether, this will allow investigators to better visualize the results of their analyses, and will
better integrate the now disconnected domains of digital pathology software and general purpose scientific and
data analysis software.

## Key facts

- **NIH application ID:** 10609284
- **Project number:** 3R01LM013523-02S1
- **Recipient organization:** NORTHWESTERN UNIVERSITY
- **Principal Investigator:** Lee Cooper
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $332,168
- **Award type:** 3
- **Project period:** 2021-09-01 → 2025-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10609284

## Citation

> US National Institutes of Health, RePORTER application 10609284, Guiding humans to create better labeled datasets for machine learning in biomedical research (3R01LM013523-02S1). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10609284. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
