Improving Artificial Intelligence Readiness of RNA Motif Data for Structure Analysis and Modeling

NIH RePORTER · NIH · R15 · $467,334 · view on reporter.nih.gov ↗

Abstract

The rapid advancement of artificial intelligence (AI) and machine learning (ML) has led to major breakthroughs in molecular structure modeling, particularly for protein structure prediction. However, accurate prediction of RNA tertiary structures remains challenging due to the limited availability of experimentally determined RNA 3D structures and the lack of standardized, AI/ML- ready datasets for training advanced algorithms. Results from the Critical Assessment of Protein Structure Prediction (CASP15) competition indicate that motif-based approaches outperform deep-learning-driven methods for RNA 3D structure modeling. Nevertheless, traditional motif- based methods are limited when applied to RNA molecules for which suitable templates are scarce in existing template libraries. To overcome this limitation, there is a need for ML-driven RNA structure prediction methods that can effectively capture relationships between nucleotides and structural motifs using large-scale RNA sequence and structure data. The integration of RNA motif-based features with AI/ML algorithms shows promise in enhancing RNA structural analysis and prediction accuracy. This proposal will develop an automated RNA motif structure parsing pipeline to generate standardized motif-based feature datasets to support AI- and ML-driven RNA structural analysis. The datasets will facilitate the training and evaluation of advanced ML algorithms and enable a broad range of RNA structure analysis applications. Specific objectives are: 1) develop an automated motif-based feature generation framework for improved RNA structure prediction with machine learning; 2) develop open-source computational workflows for RNA structure analysis using the AI/ML-ready features; and 3) improve sequence-structure modeling in full-length RNA folding by integrating RNA motif features with open-source AI/ML algorithms. The proposed AI/ML-ready features will support computational workflows including RNA motif clustering, identification of 3D motif-motif interactions, and integration with cryo-EM modeling for RNA 3D structure prediction. This project will release publicly available datasets and reproducible ML pipelines to advance fundamental RNA structure research and computational method development. This research aligns with the mission of the NIH NIGMS and the objectives of the AREA program by developing open datasets and reproducible computational workflows for RNA structure prediction.

Key facts

NIH application ID
10974883
Project number
1R15GM155891-01
Recipient
SAINT LOUIS UNIVERSITY
Principal Investigator
Hadi Ali Akbarpour
Activity code
R15
Funding institute
NIH
Fiscal year
2024
Award amount
$467,334
Award type
1
Project period
2024-09-20 → 2027-08-31