Large language models (LLM) are increasingly used by the public to seek health information, but current LLM-based systems can still generate inaccurate information due to the well-known problem of LLM hallucinations, while expressing it with high confidence. The issue of confidently representing erroneous data creates risks in high-stakes settings. This project addresses that problem by developing artificial intelligence methods that reduce hallucinations and improve the reliability, transparency, uncertainty estimation, and information-seeking behavior of large language models. The project focuses on women’s health as an application area because it provides a testbed for a broad range of conditions, including breast cancer, osteoporosis, cardiovascular disease, autoimmune disorders, and mental health. By improving the ability of language models to reduce hallucinations, communicate uncertainty, and ask clarifications questions, the project aims to accelerate the adoption of AI technologies in high-risk domains that require stable LLM behavior such as the medical domain and law, among many others. This project develops new multilingual natural language processing methods for language models operating in high-stakes environments. First, it will create methods to curate and structure evidence from heterogeneous sources into an evidence-aligned, reliability-scored knowledge repository in English, Spanish, and French, together with dynamic benchmarks that test reasoning, attribution, abstention, and clarification under evolving conditions. Second, it will develop new model training and inference methods for long-form non-hallucinating generation, fine-grained attribution, calibrated uncertainty estimation, abstention when confidence is low, and proactive follow-up questioning when user queries are ambiguous or incomplete. Third, it will establish a staged validation framework for deployment in health applications, including retrospective evaluation, expert review, u