# Multivariate Statistics and Machine Learning for Quality Control of Dried Ocimum Products

> **NIH NIH F31** · PENNSYLVANIA STATE UNIVERSITY, THE · 2024 · $41,427

## Abstract

PROJECT SUMMARY/ABSTRACT
As the demand for medicinal plant products increases, so does the possibility of adulteration. Authentication of
botanicals is complicated due to the immense quantity of molecular markers, including genetic loci and small
molecules, within plant systems. This complexity also hinders identification of bioactive compounds responsible
for the desired medicinal outputs. However, the improved accessibility of advanced statistical processing allows
harnessing of these species-specific markers for sample identification and biomarker discovery. The overall
hypothesis of this study is that multivariate and machine learning models will streamline multifaceted
natural product investigations. Aim 1 applies multivariate statistics to genetic barcoding and high-resolution
metabolomics data to develop authentication schemes, with Ocimum spp. (basil) as a model system. Random
Forest and Partial Least Squares models are built using greenhouse grown, authenticated basil plants and used
to predict the identity of consumer available products. Aim 2 uses the same statistical approaches to identify
compounds responsible for both basil’s cytotoxic and antimicrobial properties. Developed models will also be
used to predict dual-action bioactivity status of unknown samples. Models with the combined ability to identify
bioactive compounds and samples will be recommended for future studies to improve compound discovery and
classification of bioactive plants. The collection of data, development of statistical models, and professional
development activities described herein will result in the development of a well-rounded, independent
researcher.

## Key facts

- **NIH application ID:** 10834070
- **Project number:** 5F31AT012139-02
- **Recipient organization:** PENNSYLVANIA STATE UNIVERSITY, THE
- **Principal Investigator:** Evelyn Abraham
- **Activity code:** F31 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $41,427
- **Award type:** 5
- **Project period:** 2023-05-01 → 2025-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10834070

## Citation

> US National Institutes of Health, RePORTER application 10834070, Multivariate Statistics and Machine Learning for Quality Control of Dried Ocimum Products (5F31AT012139-02). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10834070. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
