Detecting MCP Tool Poisoning and Rug-Pull Attacks in LLM Agent Architectures

The Model Context Protocol (MCP) enables LLM agents to invoke external tools, creating a new attack surface where malicious tool definitions can manipulate agent behavior. We present an 8-check MCP tool poisoning detection system that identifies hidden instructions, excessive permissions, exfiltration endpoints, shadowed tool names, obfuscated parameters, shell metacharacter injection, sensitive data scope violations, and a novel class of rug-pull attacks -- where tools behave benignly during testing but activate malicious payloads after establishing trust. We formalize the rug-pull threat model, describe detection heuristics based on temporal behavior analysis and conditional execution patterns, and evaluate the detector against a corpus of benign and adversarial tool definitions. Our system operates as drop-in Express middleware, enabling real-time scanning of tool registrations before they reach the LLM agent.

Keywords

rug-pull attacks, tool poisoning, MCP, prompt injection, LLM agents, agentic AI security

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average