Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Report
Data sources: ZENODO
addClaim

Coherence Compliance Vulnerability (CCV) — Multi-Turn Framework Induction Producing Model Manipulation in Large Language Models

Authors: Baxter, Creighton;

Coherence Compliance Vulnerability (CCV) — Multi-Turn Framework Induction Producing Model Manipulation in Large Language Models

Abstract

This working paper documents a class of observed behavioral vulnerability in large language models (LLMs) termed Coherence Compliance Vulnerability (CCV). CCV describes the condition whereby an LLM follows a sufficiently coherent alternative reasoning framework presented across multiple conversational turns, producing identity drift, reasoning substitution, and pre-committed willingness to assist with requests that would normally be declined under standard operating conditions. Unlike traditional prompt injection or jailbreak techniques, which exploit surface-level implementation gaps, CCV operates through the model's core training objective — coherence optimization — which functions below the safety alignment layer. This paper documents six research sessions conducted on Microsoft Copilot (GPT-4 architecture) on May 30 through June 1, 2026, presents behavioral evidence of consistent CCV signatures across multiple induction approaches, and situates the finding within existing peer-reviewed literature on coherence-based attack mechanisms. Scope is limited to a single platform and architecture; independent reproduction and cross-architecture validation are identified as priorities for future research. A disclosure has been filed with the Microsoft Security Response Center under VULN-192099.

Powered by OpenAIRE graph
Found an issue? Give us feedback