Data Science Guided Organic Reaction Development

NIH RePORTER · NIH · R35 · $503,479 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY The overarching objective of our program is to define general data science driven workflows that incorporate physical organic precepts and can be deployed directly within the reaction optimization process. Successfully developing such a workflow would have three key impacts on the chemical synthesis enterprise: 1) significantly streamline the empirical, costly process of reaction optimization, 2) algorithms would be applied to predict how new substrates, catalysts, and reagents (as well as reaction conditions) perform in the reaction of interest as extrapolations of this sort are poorly intuited. The ability to know quantitatively the generalizability of a reaction will rapidly accelerate the uptake of new methods in chem ical synthesis. And 3) as the data driven tools described herein utilize physical organic methods to describe molecules mathematically, the resulting correlations derived from empirical data can be interpreted to provide mechanistic insights into how catalysts/substrates interact. This provides one with the foundation to “transfer” knowledge to new reactions and develop general catalyst design principles. We plan to continue to deliver to the community a compelling reason to change the culture of reaction development from empirical optimization and observations to an insightful, efficient, and high quality data producing process. This work will be accomplished in the context of asymmetric catalysis and focus on the following question: can we develop tools to predict reaction outcomes for completely new examples not represented within the training dataset required for the initial correlation, while simultaneously having interpretable/explainable statistical models? This will be accomplished by exploring various enantioselective processes catalyzed by a multitude of catalysts and interrogating the processes using modern computational chemistry and statistical methods. We will validate these new approaches by exploring if data-mining and new data collection can be used to build correlations with structural features of molecules for the prediction of altogether new examples. Within this we will ask fundamental questions about how catalyst dynamics coupled with non-covalent interactions impact catalyst performance and how to compile this information for new catalyst design strategies. Ultimately, we plan to deliver to the community a platform and pathway to facilitate reaction optimism holistically using easy to apply data science methods.

Key facts

NIH application ID
10115086
Project number
5R35GM136271-02
Recipient
UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
Principal Investigator
MATTHEW S SIGMAN
Activity code
R35
Funding institute
NIH
Fiscal year
2021
Award amount
$503,479
Award type
5
Project period
2020-04-01 → 2025-03-31