Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

ClinicalVerifier: A Retrieval-Augmented Pipeline for Detecting Guideline Contradictions in LLM-Generated Mental Health Text

Authors: Thorat, Mayuri;

ClinicalVerifier: A Retrieval-Augmented Pipeline for Detecting Guideline Contradictions in LLM-Generated Mental Health Text

Abstract

Large language models (LLMs) are increasingly deployed in mental health applications, yet their outputs may silently contradict clinical guidelines, posing serious patient safety risks. This paper presents ClinicalVerifier, a retrieval-augmented generation (RAG) system that automatically detects when LLM-generated clinical text contradicts NICE and WHO evidence-based guidelines. The system combines a FAISS-indexed embedding store of guideline excerpts with an LLM judge (Llama-3.3-70B via Groq), augmented by a Neighbourhood Consistency Scoring (NCS) hallucination probe to produce calibrated combined risk levels (LOW / MEDIUM / HIGH). Guideline sources include NICE CG90, NG185, CG178, NG116, CG53, CG42, and the WHO mhGAP Intervention Guide 2023. Evaluated on a 30-case labelled benchmark spanning safe, uncertain, and contradicts clinical outputs, ClinicalVerifier achieves: 73.3% overall accuracy 95.2% F1 on safety-critical contradiction detection 100% recall on guideline-contradicting cases 100% precision on HIGH combined-risk alerts The system's conservative design ensures no guideline violations are missed, while the dual-signal architecture (RAG verdict + NCS score) minimises false alarms. The pipeline is fully open-source, runs without proprietary API access, and is designed for integration into clinical AI monitoring workflows as a sidecar service.

Keywords

WHO guidelines, FAISS, Llama, retrieval-augmented generation, contradiction detection, RAG, hallucination detection, LLM safety, guideline compliance, clinical AI monitoring, patient safety, large language models, clinical NLP, mental health AI, NICE guidelines

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!