# Collaborative Research: The State of the State: Archival, Unstructured Data and Machine Learning

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · Emory University (GA) · $270,137

## Abstract

This project uses machine learning to create a database of State of the State (SOTS) addresses from 1800 to 2016 and state-level agendas. The data collection involves collecting and cleaning the full set of speeches from governors over time. SOTS data are stored at publicly available data repositories and a website developed by the PIs. Methodologically, the project advances the study of unstructured data and the use of artificial intelligence and machine learning. The data support knowledge and scholarship related to public decision and provide a web resource for educators and journalists.

This project extends the SOTS dataset that covers state-of-the-state addresses from 1800 to 2016. The PIs collect, process, and analyze SOTS speeches from years prior to 1960, using techniques developed to overcome poor quality documents implemented through software created by one of the PIs. The software applies machine learning to isolate, enhance, and extract text from hard-to-read documents, correcting document layout problems with a novel statistical approach before it runs optical character recognition (OCR). This results in a significantly higher level of accuracy than other current approaches.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

## Key facts

- **NSF award ID:** 2529210
- **Awardee organization:** Emory University (GA)
- **SAM.gov UEI:** S352L5PJLMP8
- **PI:** Joseph L Sutherland
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Artificial Intelligence (AI), Machine Learning Theory, UNDERGRADUATE EDUCATION, GRADUATE INVOLVEMENT
- **Estimated total:** $270,137
- **Funds obligated:** $270,137
- **Transaction type:** Standard Grant
- **Period:** 09/01/2025 → 08/31/2027

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2529210

## Citation

> US National Science Foundation, Award 2529210, Collaborative Research: The State of the State: Archival, Unstructured Data and Machine Learning. Retrieved via AI Analytics 2026-06-06 from https://api.ai-analytics.org/grant/nsf/2529210. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
