Addressing Factual Inaccuracy and Unfaithful Reasoning of Large Language Models in Biomedicine and Healthcare

NIH RePORTER · NIH · R01 · $375,964 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT Large Language Models (LLMs) represent the latest advancement in Natural Language Processing (NLP) and Artificial Intelligence (AI), holding tremendous potential to revolutionize biomedical and healthcare applications. Extensive research has demonstrated the effectiveness of LLMs in a range of biomedical and health applications, ranging from medical question answering to summarizing systematic reviews and AI-assisted disease diagnosis. However, the major barriers to applying LLMs in biomedical and health applications are factual incorrectness – where LLM-generated responses are inaccurate or incomplete – and unfaithful reasoning – where LLM-generated responses lack supporting evidence, contradict existing evidence, or even rely on hallucinated evidence. Such issues further pose the risk of propagating misinformation, potentially leading to misdiagnosis or incorrect treatment recommendations. Addressing these issues has been challenging, primarily due to three fundamental obstacles: (1) from the data perspective, LLMs may capture misinformation from lower-quality or unauthorized sources in the general domain data during pretraining, lack access to accurate and up-to-date biomedical knowledge, and consequently generate inaccurate, outdated, or unfaithful results; (2) from the methods perspective, there is a lack of mechanisms for fact-checking and evidence attribution throughout the lifecycle of LLMs when applied to biomedical and health studies, spanning from training/fine-tuning to inference and post-hoc analysis; (3) from the accountability perspective, few approaches have evaluated their effectiveness in biomedical and health downstream applications. Our overall objective in this proposal is to systematically address the issue of factuality and unfaithful reasoning of LLMs in biomedicine and healthcare. The specific aims include (1) from the data perspective, establishing a self- augmentation framework to teach LLMs to automatically select and use relevant biomedical digital resources to augment their responses; (2) from the methods perspective, developing an LLM curator by stimulating fact- checking and evidence attribution performed in biocuration via a multi-stage, multi-task instruction tuning pipeline; (3) from the methods perspective, introducing a step-level automated feedback-guided paradigm for LLMs to reflect and improve from its intermediate responses via fact-checking and evidence attribution; and (4) from the accountability perspective, evaluating the methods in downstream use cases. The proposed work is expected to address factual incorrectness and unfaithful reasoning of LLMs – the key barrier to their use in biomedical and health domains – and make LLMs generate accurate and trustworthy responses to advance biomedical discovery and healthcare. It is also expected to refine the current development and evaluation pipelines of LLMs in biomedical and health domains by making fact-checking and evidence attrib...

Key facts

NIH application ID: 10946218
Project number: 1R01LM014604-01
Recipient: YALE UNIVERSITY
Principal Investigator: Qingyu Chen
Activity code: R01
Funding institute: NIH
Fiscal year: 2024
Award amount: $375,964
Award type: 1
Project period: 2024-08-29 → 2028-07-31