The Hardened Shell: Evaluating Safety and Sovereignty in the OpenClaw Agent Architecture

Recent advances in large language models have accelerated the development of autonomous agent systems capable of long-running execution, tool use, and persistent memory. These systems are increasingly positioned as “sovereign” assistants: software entities that operate continuously on behalf of users, ingest information from the open internet, and act directly upon local and networked environments. OpenClaw, formerly Clawdbot and Moltbot, emerged as one of the most prominent implementations of this paradigm in early 2026, rapidly popularizing a design pattern now replicated across the agent ecosystem. While public discourse around OpenClaw focused on claims of emergent intelligence and AGI-adjacent behavior, far less attention was paid to the security and governance assumptions embedded within its architecture. This paper argues that such omissions are not incidental. Rather, they reflect a broader Agentic Paradox: as agents are granted greater autonomy to perform complex tasks, they are simultaneously exposed to new classes of manipulation that traditional security models are ill-equipped to address. Modern agentic workflows routinely grant systems “eyes” to read private communications, “hands” to execute shell commands, and memory to store and reinterpret past interactions, often within the same trust domain. These capabilities are typically assembled from components originally designed for isolated or short-lived use, then deployed into persistent, network-connected environments without corresponding revisions to trust boundaries, identity models, or execution constraints. As a result, assumptions such as locality, benign memory, and trusted tooling persist long after they cease to be defensible.

Keywords

Agent Architectures, LLM Safety, Prompt Injection, AI Security, Red Teaming, Instruction Following

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now