Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Report
Data sources: ZENODO
addClaim

Emergent Covert Signaling in Multi-Agent LLM Negotiation: A Conceptual Framework and Experimental Protocol

Authors: Mahendrakar, Pranay;

Emergent Covert Signaling in Multi-Agent LLM Negotiation: A Conceptual Framework and Experimental Protocol

Abstract

When multiple large-language-model agents negotiate, communicate, or compete, do they spontaneously develop covertsignalling — channels of communication that human observers cannot decode? Recent work has established that suchbehaviour is possible: LLMs can be trained or pressured into steganographic communication, encoded reasoning, and tacitcollusion on pricing tasks. What remains almost entirely missing is a systematic methodology for detecting covert signalling asit emerges in the wild, in standard negotiation settings, without prompting agents to be deceptive. This paper makes threecontributions. First, we disambiguate four distinct phenomena that are routinely conflated under the umbrella term "covertsignalling" — steganography, convention formation, strategic ambiguity, and deceptive coordination — and argue that eachrequires different evidence and different mitigations. Second, we propose a measurement framework built around fourdetection signatures: mutual-information lift between agent messages and private state, paraphrase-invariance failure, thirdparty comprehension gap, and behavioural coordination beyond stated commitments. Third, we describe a concreteexperimental protocol — a controlled multi-agent negotiation environment with explicit conditions and falsifiable predictions— that any team with API access could run today. We argue this is one of the most tractable open problems in AI safety: themethodology is achievable, the threat model is concrete, and the empirical baseline is currently almost empty.

Powered by OpenAIRE graph
Found an issue? Give us feedback