Emergent Covert Signaling in Multi-Agent LLM Negotiation: A Conceptual Framework and Experimental Protocol

Mahendrakar, Pranay

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Emergent Covert Signaling in Multi-Agent LLM Negotiation: A Conceptual Framework and Experimental Protocol

descriptionPublicationkeyboard_double_arrow_right Report Under curationPublisher:Zenodo

Authors: Mahendrakar, Pranay;

doi: 10.5281/zenodo.19853493

Emergent Covert Signaling in Multi-Agent LLM Negotiation: A Conceptual Framework and Experimental Protocol

- Summary

Abstract

When multiple large-language-model agents negotiate, communicate, or compete, do they spontaneously develop covertsignalling — channels of communication that human observers cannot decode? Recent work has established that suchbehaviour is possible: LLMs can be trained or pressured into steganographic communication, encoded reasoning, and tacitcollusion on pricing tasks. What remains almost entirely missing is a systematic methodology for detecting covert signalling asit emerges in the wild, in standard negotiation settings, without prompting agents to be deceptive. This paper makes threecontributions. First, we disambiguate four distinct phenomena that are routinely conflated under the umbrella term "covertsignalling" — steganography, convention formation, strategic ambiguity, and deceptive coordination — and argue that eachrequires different evidence and different mitigations. Second, we propose a measurement framework built around fourdetection signatures: mutual-information lift between agent messages and private state, paraphrase-invariance failure, thirdparty comprehension gap, and behavioural coordination beyond stated commitments. Third, we describe a concreteexperimental protocol — a controlled multi-agent negotiation environment with explicit conditions and falsifiable predictions— that any team with API access could run today. We argue this is one of the most tractable open problems in AI safety: themethodology is achievable, the threat model is concrete, and the empirical baseline is currently almost empty.

Found an issue? Give us feedback