The world has become data driven. Organizations, such as companies, domain sciences, and government agencies, increasingly have numerous datasets, scattered in many locations. When starting a data science or AI project, users often must find a specific datasets, then analyze them to extract insights. However, finding the needed datasets among a “sea of datasets” is often very difficult. So organizations increasingly use data catalogs for this purpose. A data catalog stores the names, descriptions, and other characteristics of datasets, as well the relationships among them. Users can then query the catalog to find desired datasets. As such, data catalogs have become a critical enabler for data science and AI projects. Yet the state of the art in catalog development has remained quite limited, leading to underwhelming performance that falls short of the users’ needs. In particular, not enough attention is devoted to the “pain points” of catalog users, and there is very little interaction among the research, vendor, user, and open-source tool communities. This has negatively impacted users, especially in domain sciences, with anecdotal evidence of intensive manual work to construct catalogs. This project seeks to address these limitations by first developing innovative and practical solutions for several pain points of catalog users, thereby accelerating research on these critical topics. Second, the project will combine these solutions to build SmartCat, a catalog software, and