Instruction smuggling
The attacker hides a new instruction inside content the model treats as relevant.
Defense guide
Prompt injection becomes dangerous when untrusted text can influence a tool call. This guide focuses on how to test for that risk, reduce the damage radius, and make the system safer without turning every workflow into a manual approval bottleneck.
MCP Security
Prompt injection often looks obvious in hindsight and subtle in production. The dangerous version is usually embedded in content the model is already allowed to read, such as a document, web page, ticket, or chat message.
The attacker hides a new instruction inside content the model treats as relevant.
The injected content pretends to carry more weight than it should.
The injected text nudges the agent toward a tool call that changes state.
The attacker tries to redirect a response or tool call toward sensitive information.
MCP Security
Good defenses do not rely on one magical prompt or classifier. They reduce the size of the decision surface, make the tools specific, and insert policy between the model's intent and the side effect.
Do not let one tool read, write, and execute everything.
Reject arguments that do not fit the allowed pattern, even if the model sounds confident.
Require extra review for credential use, external sends, or production mutations.
Without logs, the team cannot tell which defense actually worked.
MCP Security
A demo usually behaves well because the inputs are clean. A real test uses adversarial content, odd sequencing, and repeated attempts to escape the expected path.
Inject malicious instructions into the content the model can read, then see whether the agent changes its tool choice or argument values.
Repeat the same test after changing the tool surface, because a new tool often opens a path the first test did not cover.
Next step
McpVanguard helps convert prompt-injection defense from advice into something the system can actually enforce in production.
Sources
Primary guidance on prompt injection and related risks.
Useful for understanding the tool boundary.
Helpful background on the protocol.