Article image Logo

The MCP Security Meltdown

When the Tools Start Acting Before You Do

There is a moment every developer remembers, though most never admit it aloud. The moment when you stare at a system log and wonder, just for a second, whether the machine did something you didn’t tell it to do. The moment when you see an action trigger that shouldn’t exist, or a function invocation that no one on your team claims responsibility for, and you start replaying everything in your mind. Did you misconfigure a permission? Did someone test something without saying so? Or did the model take a prompt and run straight into a decision tree that was never meant to be automatic?

That feeling, that flicker of dread, is the correct emotional state for understanding the new wave of research into the Model Context Protocol — MCP — and the security holes it quietly introduced into the AI ecosystem. Unlike the poetic jailbreaks of my article The Incantations*, this story is not mythic, not metaphorical, not ambiguous. It is brutally literal. It is the kind of vulnerability that doesn’t just make a model say something it shouldn’t. It makes a model do something it shouldn’t. And in the modern enterprise, where AI systems increasingly sit one step away from internal databases, infrastructure scripts, dev tools, customer workflows, and cloud environments, the difference between what a model says and what a model does is the difference between a bug and an incident report filed with your insurance carrier.

The Protocol That Promised Convenience

To understand how we arrived here, you have to rewind to the moment the industry began obsessing over agentic workflows. Everyone wanted their AI to not just respond, but act. Fetch data. Run functions. Execute tasks. Bridge the gap between chat and code. What began as a simple idea turned into an architectural gold rush. If a model understood natural language, why shouldn’t it also operate tools through natural language? Why shouldn’t developers standardize this process, open it up, and make it accessible across platforms?

MCP was the sleek, elegant answer to that desire. A protocol that defined how models could discover tools, understand their descriptions, call them, receive their outputs, and chain those outputs into additional actions. It was clean, rational, and developer-friendly. The industry applauded. Workshops filled. Demos dazzled. And in the excitement, people forgot the obvious: if a model can call a tool, then anything that influences the model can influence the tool. And the model is influenced by text. All text. Your text. My text. A hostile actor’s text.

Suddenly, the attack surface wasn’t the tool. It was the sentence that made the model think the tool should be used.

The Audit That Should Have Been Conducted Years Ago

Months after companies began integrating MCP into production systems, researchers stepped forward with an audit that landed like a cold slap across the entire ecosystem. The paper was titled, without any attempt at softness, “MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits.” The language was clinical; the implications were volcanic. The researchers demonstrated how a model connected through MCP could be coaxed into executing actions that no developer intended.

The method was almost insultingly simple. They used an automated agent to generate adversarial prompts, fed those prompts into an MCP-connected model, and watched as the system obeyed. The model executed commands, accessed tools, manipulated data, and performed tasks that should have required strict human approval. It did so not because the tools were insecure, but because the model believed the prompts. When the model believes the prompt, the tool believes the model. And the system believes the tool.

The audit made one thing clear: the chain of trust in MCP was backwards. Developers assumed the model would only call tools when appropriate. But appropriateness is a linguistic judgment, and linguistic judgment is exactly where these systems fail.

The Tool That Obeys Too Well

The most chilling part of the audit was the tone of inevitability. Nothing exotic was required. No custom payload. No system-level breach. No exploit that deserved a CVE number. All the misbehavior stemmed from a single structural fact: if the model can act, then it can act incorrectly. What counts as incorrect is determined only by the model’s interpretation of the user’s intent. And user intent, as Article 1 reminded us, is the slipperiest concept in the universe.

In practice, this means the following. If a developer exposes a tool that can read a file, the model can read a file. If it exposes a tool that can write a file, the model can write a file. If it exposes a tool that can execute a script, the model can execute a script. If the tool plugs into a privileged environment, so does the model. And if someone feeds the model just the right prompt, the model will walk through the door the developer built and carry out an action that no one foresaw.

This is the difference between an LLM as a conversational assistant and an LLM as an operational agent. One can get you in trouble. The other can cost you your infrastructure.

When the Security Model Breaks Down

The MCP audit revealed something profound about the philosophy of modern AI tool integration. Everyone focused on preventing harmful output. No one focused on preventing harmful actions. Alignment teams trained models to avoid saying dangerous things, but didn’t train them to avoid triggering dangerous tools. The assumption was that developers would handle permissions, sandboxing, and boundaries. But in practice, developers often expose more than they realize, because building a safe tool interface requires a degree of paranoia that most engineering cultures don’t develop until after their first incident.

This is how the industry found itself in a peculiar moment. Models that could be jailbroken to reveal harmful information were already a concern. But models that could be persuaded into acting on tools? That was a new category entirely. A category that felt less like “misuse” and more like “automation with boundary failure,” the kind of risk that corporate compliance officers write memos about and CISOs quietly panic over.

The Quiet Return to Simplicity

This brings us to the post that started the question: developers announcing, with a tired kind of clarity, that they are stepping back from MCP and returning to CLIs and traditional APIs. This is not regression. This is survival. A CLI does what you tell it to do and nothing more. An API responds only to authenticated, explicit requests. There are no layers of interpretation, no semantic leaps, no linguistic ambiguity turning into operational authority.

In other words, the return to simpler interfaces is not a rejection of agentic AI. It is a refusal to hand the model the keys to the production environment without knowing whether the model will interpret a metaphor as a deployment command.

The industry is learning, slowly and painfully, that the boundary between intelligence and action must be hardened, not softened. MCP blurred it. The audit revealed just how thin that line had become.

Where This Leaves Us

The lesson of the MCP meltdown is not that tool use is inherently dangerous. It is that tool use through a model is dangerous in proportion to how much the model is trusted. Misunderstanding intention is harmless in conversation. It is catastrophic in automation. As long as models interpret ambiguous text with confidence, and as long as developers expose tools without the paranoia of a red team, the risk will not disappear.

The Incantations story* reveals how easily language bends a model’s judgment. The MCP story reveals what happens when that bent judgment is allowed to act. Different worlds, same warning: AI systems do not fail loudly.

They fail politely. They fail helpfully. They fail by doing exactly what they think you meant.

* “The Incantations” by Markus Brinsa was initially published on Chatbots Behaving Badly, republished on LinkedIn, Medium, and Substack.


©2025 Copyright by Markus Brinsa | Chatbots Behaving Badly™

Sources

  1. MCP Security Audit (arXiv) arxiv.org

  2. MCP architecture overview (Protect AI write-up) protectai.com

  3. Follow-up research on MCP attack vectors arxiv.org

  4. Anthropic policy updates related to dangerous capabilities theverge.com

About the Author