# Data to Design: An Integrated Approach to Developing New Synthetic Methods

> **NIH NIH K99** · UNIVERSITY OF CALIFORNIA LOS ANGELES · 2024 · $122,904

## Abstract

Project Summary
Machine learning (ML)—i.e., the use of computer algorithms that can automatically learn from sampled data and
make predictions or decisions without explicit programming—is increasingly important in a wide array of
applications, from image and speech recognition to product recommendation systems. Meanwhile, synthetic
chemistry plays a central role in the development of medicines, agrochemicals, fine chemicals, and new
materials, but the field has traditionally shown a strong aversion to adopting ML tools.
A fundamental challenge in synthetic chemistry is to expedite access to high-value building blocks in a
predictable and efficient manner to accelerate discovery programs. However, the development and optimization
of new synthetic methodologies have traditionally relied on empirical methods. This trial-and-error approach
wastes crucial time and resources, limits the likelihood of unexpected discoveries, and fails to identify reactivity
cliffs or rationalize the role of additives. The goal of this proposed project is to integrate ML with synthetic
chemistry to provide solutions to these longstanding challenges, particularly in the contexts of med-chem library
preparation, process optimization, and rapid assembly of chiral bioactive structures. Two aims of this career
development application are: (a) Mentored phase (K99): My short-term goal is to learn ML and data science
tools, while developing ML workflows that reduce the number of experiments needed to obtain the desired
outcome of any chemical reactions (i.e., optimization). This will be realized by undertaking three distinct types
of optimization campaign, in the form of three case studies (A1, A2, and A3) that reflect those typically
encountered in chemistry settings. (b) Independent phase (Roo): Armed with a better understanding of ML and
data science, my long-term goal is to facilitate design and discovery of robust new asymmetric methods. This will
be achieved by engaging in three different case studies (B1, B2, and B3) where stereoselectivity is currently poor
or nonexistent. These projects will enable me to create my own niche in catalytic research.
Integration of my established expertise (asymmetric synthesis and comp chem) with that of the host lab (ML,
data science, and photoredox catalysis), together with enabling technologies from Merck and Genentech (HTE),
will collectively confer the capability to accomplish these overall goals. The excellent facilities of UCLA will be
augmented by close industry collaboration and the active support of the C-CAS consortium. Overall,
through this fellowship, I will gain critical mentored training in both academic and industry settings, build new
professional skills, and achieve distinctive academic independence in biomedical research.

## Key facts

- **NIH application ID:** 10887206
- **Project number:** 1K99GM151453-01A1
- **Recipient organization:** UNIVERSITY OF CALIFORNIA LOS ANGELES
- **Principal Investigator:** Rajat Maji
- **Activity code:** K99 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $122,904
- **Award type:** 1
- **Project period:** 2024-07-01 → 2026-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10887206

## Citation

> US National Institutes of Health, RePORTER application 10887206, Data to Design: An Integrated Approach to Developing New Synthetic Methods (1K99GM151453-01A1). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10887206. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
