# III: Medium: SMARTCAT: Developing Smart Data Catalogs for Data Science and AI

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · University of Wisconsin-Madison (WI) · $1,000,000

## Abstract

The world has become data driven. Organizations, such as companies, domain sciences, and government agencies, increasingly have numerous datasets, scattered in many locations. When starting a data science or AI project, users often must find a specific datasets, then analyze them to extract insights. However, finding the needed datasets among a “sea of datasets” is often very difficult. So organizations increasingly use data catalogs for this purpose. A data catalog stores the names, descriptions, and other characteristics of datasets, as well the relationships among them. Users can then query the catalog to find desired datasets. As such, data catalogs have become a critical enabler for data science and AI projects. Yet the state of the art in catalog development has remained quite limited, leading to underwhelming performance that falls short of the users’ needs. In particular, not enough attention is devoted to the “pain points” of catalog users, and there is very little interaction among the research, vendor, user, and open-source tool communities. This has negatively impacted users, especially in domain sciences, with anecdotal evidence of intensive manual work to construct catalogs. This project seeks to address these limitations by first developing innovative and practical solutions for several pain points of catalog users, thereby accelerating research on these critical topics. Second, the project will combine these solutions to build SmartCat, a catalog software, and

## Key facts

- **NSF award ID:** 2504787
- **Awardee organization:** University of Wisconsin-Madison (WI)
- **SAM.gov UEI:** LCLSJAGTNZQ7
- **PI:** AnHai Doan
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** INFO INTEGRATION & INFORMATICS, MEDIUM PROJECT
- **Estimated total:** $1,000,000
- **Funds obligated:** $1,000,000
- **Transaction type:** Standard Grant
- **Period:** 07/01/2025 → 06/30/2029

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2504787

## Citation

> US National Science Foundation, Award 2504787, III: Medium: SMARTCAT: Developing Smart Data Catalogs for Data Science and AI. Retrieved via AI Analytics 2026-06-08 from https://api.ai-analytics.org/grant/nsf/2504787. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
