Foundation guide

What Is MCP Security?

MCP security is the practice of inspecting, validating, and enforcing policy on every tool call an AI agent makes through the Model Context Protocol. It sits at the boundary where language model output becomes a system action.

Core questionHow do you prevent an agent from acting on unsafe tool calls?
AnswerDeterministic policy enforcement at the tool boundary
Risk without itPrompt injection can lead to file reads, API calls, or data writes the operator never intended

Context

Why MCP changes the security boundary

Before MCP, the security conversation around AI was mainly about what the model said — prompt injection, jailbreaking, output filtering. MCP adds a second dimension: what the agent does.

When an agent can invoke tools that read files, call APIs, query databases, or trigger automation, the question shifts from "did the model say something unsafe?" to "is the agent about to do something unsafe?"

This is a fundamentally different security problem. Prompt filters and LLM-as-judge approaches can catch obvious attacks, but they are probabilistic — they cannot guarantee that a blocked action stays blocked. Deterministic enforcement at the tool-call boundary closes that gap.

Threat model

The MCP attack surface

Prompt injection into tool calls

Untrusted content embedded in a document, web page, or message steers the agent toward a tool call it would not normally make. The injected text exploits the model's instruction-following behavior to produce a specific tool invocation.

Argument manipulation

Even if the tool choice is correct, an attacker can influence the arguments passed to it. A file-read tool called with an attacker-controlled path can leak sensitive files.

Tool chaining

An apparently safe first tool call creates an artifact that a second tool interprets unsafely. The attacker does not need to control every step — only the critical junction.

Over-broad tool permissions

A tool that combines read, write, and execute behaviors creates a single point of failure. If the model is manipulated into calling it, the blast radius is far wider than necessary.

Controls

The MCP security control stack

A production-ready MCP security posture uses multiple layers. No single control catches everything.

Layer 0: Preflight normalization

Normalize tool call inputs before any policy evaluation. Strip encoding tricks, normalize paths, and reject malformed arguments before they reach deeper layers.

Layer 1: Deterministic rules

Pattern-based rules that block known attack patterns: path traversal, command injection, credential exfiltration attempts. These catch the majority of automated attacks with zero false positives.

Layer 2: Semantic scoring

An optional LLM-based advisor that evaluates whether the combination of tool, arguments, and session context looks suspicious. Useful for catching novel attacks that rules cannot describe.

Layer 3: Behavioral tracking

Monitor tool call sequences across a session. Detect anomalies like rapid escalation, unusual argument patterns, or delegation attempts that bypass earlier controls.

Approach

Audit-only vs enforcement

Most teams start with audit-only mode: log every tool call and its policy verdict, but do not block anything. This lets the team understand their traffic patterns and tune rules before switching to enforcement mode.

In enforcement mode, blocked calls never reach the tool. The agent receives a clear rejection with a reason code, and the operator can review the decision in audit logs. Enforcement requires confidence in the rule set, which is why audit-first adoption is the recommended path.

Audit mode — Log every call, evaluate every policy, block nothing. Use for discovery and tuning.

Balanced mode — Block high-confidence threats, log everything else. Use for initial enforcement.

Strict mode — Block all suspicious calls, require review for ambiguous ones. Use for production-sensitive systems.

Available tooling

A growing ecosystem of tools addresses different parts of the MCP security stack:

McpVanguard

Open-source MCP security gateway that implements multi-layer deterministic enforcement, policy composition, and audit logging. Supports monitor, balanced, and strict profiles.

Snyk Agent Scan

Security scanner for agents and MCP servers. Useful for finding known vulnerabilities in server configurations.

MCP Guard

Client-side protection against prompt injection and risky tool use. Useful as an additional layer alongside gateway enforcement.

Start with audit, move to enforcement

McpVanguard supports all three deployment modes. Deploy in audit mode today, tune your policies, and switch to enforcement when your team is ready.

Explore McpVanguard