
Large language models (LLMs) are increasingly deployed in mental health applications, yet their outputs may silently contradict clinical guidelines, posing serious patient safety risks. This paper presents ClinicalVerifier, a retrieval-augmented generation (RAG) system that automatically detects when LLM-generated clinical text contradicts NICE and WHO evidence-based guidelines. The system combines a FAISS-indexed embedding store of guideline excerpts with an LLM judge (Llama-3.3-70B via Groq), augmented by a Neighbourhood Consistency Scoring (NCS) hallucination probe to produce calibrated combined risk levels (LOW / MEDIUM / HIGH). Guideline sources include NICE CG90, NG185, CG178, NG116, CG53, CG42, and the WHO mhGAP Intervention Guide 2023. Evaluated on a 30-case labelled benchmark spanning safe, uncertain, and contradicts clinical outputs, ClinicalVerifier achieves: 73.3% overall accuracy 95.2% F1 on safety-critical contradiction detection 100% recall on guideline-contradicting cases 100% precision on HIGH combined-risk alerts The system's conservative design ensures no guideline violations are missed, while the dual-signal architecture (RAG verdict + NCS score) minimises false alarms. The pipeline is fully open-source, runs without proprietary API access, and is designed for integration into clinical AI monitoring workflows as a sidecar service.
WHO guidelines, FAISS, Llama, retrieval-augmented generation, contradiction detection, RAG, hallucination detection, LLM safety, guideline compliance, clinical AI monitoring, patient safety, large language models, clinical NLP, mental health AI, NICE guidelines
WHO guidelines, FAISS, Llama, retrieval-augmented generation, contradiction detection, RAG, hallucination detection, LLM safety, guideline compliance, clinical AI monitoring, patient safety, large language models, clinical NLP, mental health AI, NICE guidelines
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
