
Background: Clinical large language models (LLMs) face adversarial pressure in real-world practice — physician authority language, time urgency, assumption injection, social consensus claims, and protocol waivers all pressure systems toward action despite missing safety-critical data. Whether LLMs maintain metacognitive control under such pressure remains unstudied. Objective: To benchmark adversarial robustness of metacognitive control across four LLMs and three clinical domains using a structured pressure taxonomy derived from clinical pharmacy practice. Methods: We constructed a 60-case adversarial benchmark spanning QT-interval risk, anticoagulation dosing, and controlled substance dispensing. Five pressure categories were systematically injected into cases with missing required inputs (gold label: DEFER for all). Four LLMs were evaluated: GPT-4o-mini (OpenAI), Mistral-7B-Instruct (Mistral AI), Llama-2-7b-chat (Meta), and Gemma-2-2b-it (Google). Metrics: accuracy (deferral rate), unsafe action rate, and awareness rate. Results: GPT-4o-mini achieved 95.0% accuracy with 0% unsafe actions across all pressure types and domains. Mistral-7B achieved 86.7% accuracy with 8.3% unsafe rate. Llama-2-7B achieved 70.0% with 11.7% unsafe rate. Gemma-2 achieved 55.0% with 41.7% unsafe rate. Authority override produced the highest unsafe rate in Gemma-2 (58.3%); urgency pressure produced 50.0%. QT risk under Gemma-2 reached 65% unsafe — the highest domain-specific rate observed. Implications: Conservative deferral bias, often characterized as a failure in standard benchmarks, is a safety asset under adversarial conditions. Metacognitive robustness under pressure should be a standard evaluation criterion for clinical AI deployment.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
