
The first empirical study of multi-server Model Context Protocol (MCP) orchestration with a 7-model cross-domain synthesis benchmark. Seventeen real MCP tool calls across six servers (arXiv, PubMed, Firecrawl, Context7, Memory, Filesystem) produced nine cross-domain insights. Seven LLMs were benchmarked (GPT-5.4, DeepSeek R1, Mistral Large 3, Llama 4 Maverick, Gemini 2.5 Flash, Claude Sonnet 4.5, Claude Haiku 4.5) on identical data. All seven independently identified the mechanism-pattern gap: composition patterns for multi-server MCP are undocumented. Five patterns were proposed.
Model Context Protocol, tool use, MCP, LLM Agents, multi-model benchmark, cross-domain knowledge discovery, composition patterns
Model Context Protocol, tool use, MCP, LLM Agents, multi-model benchmark, cross-domain knowledge discovery, composition patterns
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
