# CRII: III: Towards Traceable, Affordable and Explainable Automated Feature Generation for Tabular Data

> **NSF 01002425DB NSF RESEARCH & RELATED ACTIVIT** · Clemson University (SC) · $174,811

## Abstract

Features are used to describe the characteristics of objects. For example, "age", "smoking or not", and "years of smoking" are features of a patient, which can be used to describe the patient's physical condition, and furthermore, to predict if she or he is likely to get lung cancer. A combination of features could be more helpful to the prediction, e.g., "age" minus "years of smoking" can be a new feature to indicate how early the patient starts smoking. This kind of feature combination is called feature generation. In the big data era, there exist enormous numbers of features, and it is not realistic to generate features manually by human experts. This project will build new technologies to automatically generate new features based on existing features, to better describe the objects, and to gain better prediction performance. Additionally, this project aims to substantially improve the traceability, affordability, and explainability during the generation process. The developed algorithms and tools are expected to be generalized and applicable to a broad range of scientific and engineering problems, not just in feature generation, but also in other domains such as data pre-processing, social analysis, intelligent transportation systems, healthcare, and the internet of things. 

This project identifies three research tasks: (i) A Reinforcement Learning (RL) based approach to realize traceability. Two RL agents are used to select appropriate features, and one RL agent is u

## Key facts

- **NSF award ID:** 2550105
- **Awardee organization:** Clemson University (SC)
- **SAM.gov UEI:** H2BMNX7DSKU8
- **PI:** Kunpeng Liu
- **Primary program:** 01002425DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** INFO INTEGRATION & INFORMATICS, CISE Resrch Initiatn Initiatve
- **Estimated total:** $174,811
- **Funds obligated:** $110,820
- **Transaction type:** Standard Grant
- **Period:** 09/01/2025 → 05/31/2027

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2550105

## Citation

> US National Science Foundation, Award 2550105, CRII: III: Towards Traceable, Affordable and Explainable Automated Feature Generation for Tabular Data. Retrieved via AI Analytics 2026-06-06 from https://api.ai-analytics.org/grant/nsf/2550105. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
