SBIR TOPIC 135:Applying Large Language Models for Automated Entity Recognition, Relation Extraction/Ontology Metadata Enrichment from Free-Text Clinical Notes in Infectious/Immune-Mediated Diseases

NIH RePORTER · NIH · N43 · $299,208 · view on reporter.nih.gov ↗

Abstract

The growing complexity and volume of biomedical research data, especially in infectious and immune mediated diseases, calls for novel approaches to data enrichment and information extraction. A key challenge is efficiently creating FAIR compliant metadata from unstructured clinical documents, a process currently requiring significant time and expertise. This project aims to advance this field by developing automated AI models for metadata enrichment and standardization, leveraging John Snow Labs’ Medical Large Language Models (LLM) and Natural Language Processing (NLP) technology. This includes models that will deliver state-of-the-art accuracy, on benchmarks validated by medical doctors, on a productiongrade codebase that scales natively on commodity hardware. New models will be trained for Named Entity Recognition (NER) and Entity Resolution, that extract and standardize unstructured data into multiple biomedical ontologies. This research will bring transformative improvements in biomedical data management for infectious and immune-mediated diseases, significantly streamlining metadata enrichment processes, enhancing data utility for researchers and streamlining scientific data sharing.

Key facts

NIH application ID
11214907
Project number
75N93024C00010-0-9999-1
Recipient
JOHN SNOW LABS INC
Principal Investigator
Hasham Ul Haq
Activity code
N43
Funding institute
NIH
Fiscal year
2024
Award amount
$299,208
Award type
Project period
2024-09-05 → 2025-09-04