# Data Science Guided Organic Reaction Development

> **NIH NIH R35** · UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH · 2021 · $503,479

## Abstract

PROJECT SUMMARY
The overarching objective of our program is to define general data science driven workflows that
incorporate physical organic precepts and can be deployed directly within the reaction
optimization process. Successfully developing such a workflow would have three key impacts
on the chemical synthesis enterprise: 1) significantly streamline the empirical, costly process of
reaction optimization, 2) algorithms would be applied to predict how new substrates, catalysts,
and reagents (as well as reaction conditions) perform in the reaction of interest as
extrapolations of this sort are poorly intuited. The ability to know quantitatively the
generalizability of a reaction will rapidly accelerate the uptake of new methods in chem ical
synthesis. And 3) as the data driven tools described herein utilize physical organic methods to
describe molecules mathematically, the resulting correlations derived from empirical data can
be interpreted to provide mechanistic insights into how catalysts/substrates interact. This
provides one with the foundation to “transfer” knowledge to new reactions and develop general
catalyst design principles. We plan to continue to deliver to the community a compelling reason
to change the culture of reaction development from empirical optimization and observations to
an insightful, efficient, and high quality data producing process. This work will be accomplished
in the context of asymmetric catalysis and focus on the following question: can we develop tools
to predict reaction outcomes for completely new examples not represented within the training
dataset required for the initial correlation, while simultaneously having interpretable/explainable
statistical models? This will be accomplished by exploring various enantioselective processes
catalyzed by a multitude of catalysts and interrogating the processes using modern
computational chemistry and statistical methods. We will validate these new approaches by
exploring if data-mining and new data collection can be used to build correlations with structural
features of molecules for the prediction of altogether new examples. Within this we will ask
fundamental questions about how catalyst dynamics coupled with non-covalent interactions
impact catalyst performance and how to compile this information for new catalyst design
strategies. Ultimately, we plan to deliver to the community a platform and pathway to facilitate
reaction optimism holistically using easy to apply data science methods.

## Key facts

- **NIH application ID:** 10115086
- **Project number:** 5R35GM136271-02
- **Recipient organization:** UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
- **Principal Investigator:** MATTHEW S SIGMAN
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $503,479
- **Award type:** 5
- **Project period:** 2020-04-01 → 2025-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10115086

## Citation

> US National Institutes of Health, RePORTER application 10115086, Data Science Guided Organic Reaction Development (5R35GM136271-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10115086. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
