MCP — Model Context Protocol
Tools used to live entirely inside your agent repository—every Slack, GitHub, and internal API wired by hand. Model Context Protocol (MCP) is the open standard that standardizes how a host (your agent runtime) talks to a server (a process exposing tools, resources, and prompts over JSON-RPC). This guide covers architecture, primitives, FastMCP and Spring AI servers, security threats, and production registry patterns.
After reading, you should be able to: explain MCP host/client/server roles; choose stdio vs SSE transport; implement tools and resources in FastMCP and Spring AI; map MCP schemas to OpenAI function calling; mitigate tool poisoning and resource injection; and operate remote MCP servers with versioning and allowlists.
What is MCP?
Model Context Protocol (MCP) is an open standard that standardizes how AI applications (clients) connect to external data sources and tools (servers). Instead of every agent framework inventing its own plugin format, MCP defines JSON-RPC messages, capability discovery, and transport layers so a GitHub server written once can plug into Claude Desktop, Cursor, or your custom host.
Before MCP, tool integration looked like this: copy-paste OpenAI function schemas into your repo, wrap each SaaS API in Python, redeploy when Slack changes scopes, and hope the next model version still calls tools correctly. MCP inverts ownership: the server team publishes a versioned capability surface; the host discovers and allowlists tools at runtime.
MCP vs ad-hoc tool wrappers
| Concern | Ad-hoc tools in repo | MCP server |
|---|---|---|
| Discovery | Hard-coded list in agent code | tools/list at connect time |
| Schema drift | Agent redeploy on every API change | Server version bump; host picks compatible semver |
| Ownership | Platform team owns all integrations | Domain teams ship their own servers |
| Transport | In-process function calls | stdio (local) or HTTP/SSE (remote) |
| Auth | Secrets in agent env | OAuth on transport; scoped tokens per server |
MCP is not a replacement for your agent loop—it is the wire format between host and capability providers. Your orchestration (LangGraph, Spring AI, custom) still decides when to call tools, how many steps, and what guardrails apply.
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def discover_tools(server_cmd: list[str]) -> list[dict]:
params = StdioServerParameters(command=server_cmd[0], args=server_cmd[1:])
async with stdio_client(params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.list_tools()
return [{"name": t.name, "description": t.description,
"input_schema": t.inputSchema} for t in result.tools]
@Service
public class McpToolDiscovery {
public List discover(McpServerConfig cfg) {
try (McpSyncClient client = mcpClientFactory.connectStdio(cfg)) {
client.initialize();
return client.listTools().tools().stream()
.map(t -> new McpToolDescriptor(t.name(), t.description(), t.inputSchema()))
.toList();
}
}
}
MCP messages follow JSON-RPC 2.0. Core methods include initialize, tools/list, tools/call, resources/list, resources/read, and prompts/list. The spec evolves on modelcontextprotocol.io; pin SDK versions in production hosts.
MCP adds process boundary overhead (stdio spawn or HTTP round trip). For sub-millisecond in-memory calls, keep hot-path tools in-process and expose only cross-team or high-churn integrations via MCP.
Cursor, Claude Desktop, and VS Code Copilot extensions use MCP for filesystem, GitHub, and database servers. Enterprise teams mirror the pattern: internal MCP registry with signed server bundles per business unit.
When asked about extensible agent platforms, mention discovery + allowlisting + transport auth—not just "we expose REST APIs." MCP is the standard answer for pluggable tool ecosystems in 2025–2026.
Architecture — Host, Client, Server, transport
An MCP deployment has three roles: the Host (your AI application), one or more Clients (connectors inside the host), and Servers (processes exposing tools, resources, and prompts). Transport choice—stdio vs SSE—determines latency, security boundary, and ops model.
End-to-end request flow
- User sends message to host application.
- Host loads MCP manifest; ensures server processes are running (stdio pool or SSE connections).
- Host calls tools/list (cached with TTL) and builds merged tool registry.
- LLM receives user message + allowlisted tools; may emit tool call.
- Host validates tool name and arguments; forwards tools/call to correct client.
- Server executes with its credentials; returns content blocks.
- Host normalizes result into model API format; continues loop until final answer.
Debugging MCP issues usually means tracing which hop failed: manifest misconfig, allowlist drop, schema mapping bug, server exception, or model choosing wrong tool. Log all six stages with correlation IDs.
| Debug signal | Likely cause |
|---|---|
| Tool not visible to model | Allowlist filter or mapping error |
| initialize timeout | Server crash on boot; check stderr |
| Invalid arguments | Schema mismatch between bridge and server |
| Intermittent 401 on SSE | Expired OAuth token; refresh logic missing |
Role diagram
| Role | Examples | Responsibilities |
|---|---|---|
| Host | Cursor, Claude Desktop, your FastAPI agent service | UI, model calls, policy, aggregates MCP clients |
| Client | SDK session inside host | One server connection; routes JSON-RPC |
| Server | GitHub MCP, Postgres MCP, internal ticket MCP | Implements tools/resources; holds domain credentials |
stdio vs SSE transport
| Transport | Mechanism | Best for | Ops notes |
|---|---|---|---|
| stdio | Host spawns subprocess; stdin/stdout JSON-RPC | Local dev, sidecar on same node, CLI tools | Process per server; restart on crash; no network exposure |
| SSE (HTTP) | Server-Sent Events + POST for client→server | Remote shared services, multi-tenant registry | TLS, OAuth, rate limits, horizontal scale |
| Streamable HTTP | Unified HTTP streaming (newer spec) | Cloud-native MCP gateways | Check SDK support before adopting |
A single host often runs multiple clients—one per configured server. The agent sees a flattened tool list (with namespaced prefixes like github__create_issue) after your registry merges and allowlists.
# mcp_manifest.yaml
servers:
- name: acme-support
transport: stdio
command: python
args: ["servers/acme_support_mcp.py"]
allowed_tools: [lookup_order, search_policy]
- name: github
transport: sse
url: https://mcp.internal.example/github/sse
allowed_tools: [search_code, get_file]
oauth:
provider: github
scopes: [repo:read]
@ConfigurationProperties(prefix = "mcp")
public record McpManifest(List servers) {
public record ServerEntry(
String name, String transport, String command,
List args, String url, List allowedTools) {}
}
Namespace tool names when merging servers: {server}__{tool} prevents collisions and makes audit logs readable.
Spawning ten stdio servers per request exhausts file descriptors. Pool long-lived server processes and reuse sessions across agent turns within a conversation.
stdio servers inherit the host OS user. Run them as dedicated service accounts with minimal filesystem and network access—never as root.
Primitives — tools, resources, prompts, sampling
MCP exposes four capability families. Tools are model-invokable functions with JSON schemas. Resources are readable data URIs (files, DB rows, config). Prompts are reusable template slots the host can fetch. Sampling lets a server ask the host's LLM to complete text—a reverse call path for nested agents.
Sampling (server → host LLM)
Sampling is the least-used primitive but important for advanced servers: a tool can request the host to run a completion (e.g., summarize a large file server-side without shipping API keys to the server process). The host policy layer must approve sampling requests—rate limit and cap tokens per server identity.
Sampling inverts control flow: untrusted servers could burn token budgets. Disable sampling by default; enable per trusted server with daily quotas.
Capability comparison
| Primitive | Who initiates | Typical use | Maps to OpenAI |
|---|---|---|---|
| Tools | Model (via host) | API calls, DB queries, ticket updates | tools / function calling |
| Resources | Host (often pre-model) | Inject file contents, schema snapshots | System message / file attachments |
| Prompts | Host or user slash command | Standard review templates, runbooks | Stored prompt partials |
| Sampling | Server → Host LLM | Server-side summarization without owning API keys | N/A (server uses host model) |
Tool lifecycle
- Host connects and calls initialize with protocol version.
- Host calls tools/list; server returns name, description, input JSON Schema.
- Host maps schemas to model-native tool format (OpenAI, Anthropic, Gemini).
- Model emits tool call; host forwards tools/call with arguments.
- Server returns structured content (text, image, resource refs); host feeds result to model.
Resources vs tools
Use resources when data is large, static-ish, or should not be invoked blindly by the model—think "attach the OpenAPI spec" rather than "call search_openapi." The host reads resources and decides what enters context. Tools are for actions with side effects: create ticket, run query, post message.
def mcp_tool_to_openai(mcp_tool: dict, prefix: str = "") -> dict:
name = f"{prefix}{mcp_tool['name']}" if prefix else mcp_tool["name"]
return {
"type": "function",
"function": {
"name": name[:64],
"description": mcp_tool.get("description", "")[:1024],
"parameters": mcp_tool.get("input_schema", {"type": "object", "properties": {}}),
},
}
def merge_allowlisted(tools: list[dict], allowed: set[str], prefix: str) -> list[dict]:
return [mcp_tool_to_openai(t, prefix=f"{prefix}__")
for t in tools if t["name"] in allowed]
public OpenAiToolDefinition toOpenAi(McpToolDescriptor tool, String prefix) {
String name = prefix.isEmpty() ? tool.name() : prefix + "__" + tool.name();
return OpenAiToolDefinition.builder()
.name(name.substring(0, Math.min(64, name.length())))
.description(truncate(tool.description(), 1024))
.parameters(tool.inputSchema())
.build();
}
Every tool call is still a model round trip plus server execution. Resources preloaded into context consume tokens—prefer resource subscriptions with ETags and read only changed URIs.
Tool results can include multiple content blocks: text, images (base64), and embedded resource links. Hosts must normalize these into what the active model API accepts—Anthropic tool_result blocks differ from OpenAI tool role messages.
Exposing 200 discovered tools to the model degrades routing accuracy. Allowlist 5–15 tools per agent persona; rotate sets by workflow stage.
Python server — FastMCP
The official Python SDK ships FastMCP, a decorator-driven server builder. Annotate functions with @mcp.tool() and @mcp.resource(); run with stdio for local sidecars or configure SSE for remote deployment.
Project layout
| File | Purpose |
|---|---|
| servers/support_mcp.py | FastMCP tool + resource definitions |
| host/manifest.yaml | Which servers to spawn + allowlists |
| host/schema_bridge.py | MCP → OpenAI / Anthropic mapping |
| host/agent_loop.py | Existing tool loop; swap local tools for MCP calls |
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("acme-support")
POLICIES = {"refund": "Annual plans: pro-rata credit within 30 days."}
@mcp.resource("policy://{topic}")
def get_policy(topic: str) -> str:
"""Return policy text for a topic slug."""
return POLICIES.get(topic.lower(), "Policy not found.")
@mcp.tool()
def lookup_order(order_id: str) -> dict:
"""Fetch order status. IDs must start with ORD-."""
if not order_id.upper().startswith("ORD-"):
return {"error": "invalid_order_id"}
return {"order_id": order_id.upper(), "status": "shipped", "eta_days": 2}
@mcp.tool()
def search_policy(query: str, limit: int = 5) -> list[dict]:
"""Keyword search over policy snippets."""
hits = [k for k in POLICIES if query.lower() in k.lower()]
return [{"topic": h, "text": POLICIES[h]} for h in hits[:limit]]
if __name__ == "__main__":
mcp.run() # stdio transport by default
// Python FastMCP — use Java Spring AI section for JVM servers
async def run_tool(session: ClientSession, name: str, args: dict) -> str:
result = await session.call_tool(name, arguments=args)
parts = []
for block in result.content:
if block.type == "text":
parts.append(block.text)
return "\n".join(parts)
async def agent_turn(session, messages: list, openai_tools: list):
response = client.chat.completions.create(
model="gpt-4o", messages=messages, tools=openai_tools)
msg = response.choices[0].message
if not msg.tool_calls:
return msg.content
for tc in msg.tool_calls:
out = await run_tool(session, tc.function.name.split("__")[-1],
json.loads(tc.function.arguments))
messages.append({"role": "tool", "tool_call_id": tc.id, "content": out})
return await agent_turn(session, messages, openai_tools)
public String agentTurn(McpSyncClient client, List messages) {
ChatResponse response = chatClient.prompt().messages(messages).tools(openAiTools).call();
if (!response.hasToolCalls()) return response.getResult().getOutput().getContent();
for (ToolCall call : response.getToolCalls()) {
String result = client.callTool(call.name(), call.arguments());
messages.add(ToolResponseMessage.from(call.id(), result));
}
return agentTurn(client, messages);
}
Docstrings on FastMCP functions become tool descriptions seen by the model—write them like API docs: constraints, enums, error shapes.
Pin mcp package version in both host and server repos. Protocol mismatches surface as silent initialize failures—add integration tests that list tools in CI.
Demonstrate you know the boundary: FastMCP server owns business logic; host owns model choice, allowlists, and logging.
Java server — Spring AI @McpTool
Spring AI 1.0+ integrates MCP servers into Spring Boot applications. Annotate service methods with @McpTool and expose them over stdio or WebMvc SSE endpoints—ideal when your enterprise stack is already JVM-centric.
Spring AI MCP auto-generates JSON Schema from method signatures and JavaDoc. Controllers stay thin; domain services implement the actual ticket lookup or policy search. Deploy as a standalone jar sidecar or as a microservice behind your MCP gateway.
Dependencies (Maven)
| Artifact | Role |
|---|---|
| spring-ai-starter-mcp-server | MCP server autoconfiguration |
| spring-ai-starter-mcp-server-webmvc | SSE/HTTP transport |
| spring-boot-starter-validation | Argument validation on tools |
@Service
public class SupportMcpTools {
private final OrderService orders;
private final PolicyService policies;
@McpTool(name = "lookup_order",
description = "Fetch order status. orderId must start with ORD-")
public OrderStatus lookupOrder(
@McpToolParam(description = "Order id like ORD-1042") String orderId) {
if (!orderId.toUpperCase().startsWith("ORD-")) {
throw new IllegalArgumentException("invalid_order_id");
}
return orders.getStatus(orderId.toUpperCase());
}
@McpTool(name = "search_policy",
description = "Search refund and shipping policy snippets")
public List searchPolicy(
@McpToolParam(description = "Search query") String query,
@McpToolParam(description = "Max results", required = false) Integer limit) {
return policies.search(query, limit != null ? limit : 5);
}
}
@SpringBootApplication
public class AcmeMcpServerApplication {
public static void main(String[] args) {
SpringApplication.run(AcmeMcpServerApplication.class, args);
}
}
// application.yml
// spring.ai.mcp.server.name: acme-support
// spring.ai.mcp.server.stdio.enabled: true
@Service
public class McpAgentBridge {
private final ChatClient chatClient;
private final List mcpClients;
public String run(String userMessage) {
List tools = mcpClients.stream()
.flatMap(c -> c.listTools().tools().stream())
.filter(t -> allowlist.contains(t.name()))
.map(this::toToolCallback)
.toList();
return chatClient.prompt()
.user(userMessage)
.toolCallbacks(tools)
.call()
.content();
}
}
@Component
public class McpToolCallback implements ToolCallback {
private final McpSyncClient client;
private final McpToolDescriptor descriptor;
@Override
public ToolDefinition getToolDefinition() { return map(descriptor); }
@Override
public String call(String argumentsJson) {
return client.callTool(descriptor.name(), parseJson(argumentsJson));
}
}
JVM cold start (2–8 s) hurts stdio spawn-per-request patterns. Keep MCP server processes warm or use SSE deployment on Kubernetes with min replicas ≥ 1.
Spring AI maps @McpToolParam to JSON Schema properties with required flags. Nullable types and validation annotations propagate to the schema the model sees.
Share DTO records between REST controllers and MCP tools so OpenAPI and MCP schemas stay aligned via one source of truth.
Security — poisoning, injection, OAuth, least privilege
MCP moves trust boundaries: servers run with credentials, models choose tools, and tool descriptions are prompt surface. Production deployments need allowlists, description auditing, OAuth-scoped transports, and least-privilege server accounts.
Threat model
| Threat | Mechanism | Mitigation |
|---|---|---|
| Tool poisoning | Malicious server embeds hidden instructions in tool descriptions | Sign servers; review descriptions; strip HTML; human approve new tools |
| Resource injection | Resource URI returns attacker-controlled text into context | Host validates URI schemes; sandbox reads; max byte limits |
| Over-privileged tools | Model calls delete_database when read was enough | Separate read/write servers; per-persona allowlists |
| Token theft | Compromised host leaks OAuth refresh tokens | Short-lived tokens; vault sidecars; no tokens in prompts |
| Prompt injection via tool output | Ticket body contains "ignore previous instructions" | Output sanitization; structured JSON; secondary policy model |
OAuth for remote MCP (SSE)
- User authorizes host app via standard OAuth (PKCE for public clients).
- Host stores refresh token in vault; never passes to model.
- MCP client attaches access token to SSE connection per server registration.
- Server validates token scopes before executing tools.
- Rotate credentials; audit tool calls with user + token subject id.
| OAuth scope | Tools enabled |
|---|---|
| kb:read | search_policy, fetch_chunk |
| orders:read | lookup_order only |
| orders:write | create_refund (human approval gate) |
Map OAuth scopes to MCP tool allowlists in the host manifest—never expose write tools when the token is read-only.
Refresh tokens on a background schedule before expiry so SSE sessions do not fail mid-conversation.
TRUSTED_SERVERS = {"acme-support", "github-readonly"}
def audit_tool_description(name: str, desc: str) -> bool:
blocked = ["ignore previous", "system prompt", "override", "secret"]
lower = desc.lower()
if any(b in lower for b in blocked):
log_security_event("tool_description_flag", tool=name)
return False
if len(desc) > 2000:
return False # excessively long descriptions are suspicious
return True
def filter_tools(server_name: str, tools: list, allowed: set) -> list:
if server_name not in TRUSTED_SERVERS:
raise SecurityError(f"untrusted server: {server_name}")
return [t for t in tools if t["name"] in allowed and audit_tool_description(t["name"], t.get("description", ""))]
public List filterTools(String serverName, List tools) {
if (!trustedServers.contains(serverName)) {
throw new SecurityException("untrusted server: " + serverName);
}
return tools.stream()
.filter(t -> allowlist.contains(t.name()))
.filter(t -> descriptionAuditor.isSafe(t.description()))
.toList();
}
Treat MCP tool descriptions like user input—they enter the model context. Scan for injection patterns in CI when servers update.
Connecting to community MCP servers without code review is equivalent to installing unaudited browser extensions with DB credentials.
For security questions: least privilege allowlists, signed server artifacts, OAuth on SSE, output sanitization, and logging every tool call with arguments redacted.
Enterprise MCP registries require SBOM + security review before a server enters the catalog. Dev servers run in isolated namespaces with synthetic data only.
What this looks like in production
Production MCP is a registry + gateway + observability problem—not just a Python script. Remote SSE servers, semver contracts, staged rollouts, and unified tool audit logs let platform teams scale agents without N× custom integrations.
Production architecture
| Component | Function |
|---|---|
| Registry | Catalog of approved servers, versions, owners, allowed scopes |
| Gateway | TLS termination, OAuth, rate limits, routing to server pools |
| Host runtime | Agent service; merges tools; enforces per-tenant allowlists |
| Observability | Tool latency, error rates, argument hashes (redacted), server version |
Versioning strategy
- Servers publish semver; breaking schema changes bump major.
- Hosts pin server_version in manifest; auto-upgrade only patch.
- Contract tests: golden tools/list snapshots in CI.
- Blue/green server deploys; hosts drain old connections gracefully.
Track 4 production checklist
- Signed server bundles in internal registry
- Per-agent allowlist ≤ 15 tools
- OAuth or mTLS on all remote transports
- Tool description audit in CI
- Structured logs: server, tool, latency, user, tenant
- Fallback when server unavailable (degraded mode message)
- Load test: concurrent SSE sessions at peak QPS
@dataclass
class ProductionMcpConfig:
registry_url: str
server_pins: dict[str, str] # name -> semver
allowed_tools: dict[str, set[str]]
oauth_vault_path: str
max_tool_calls_per_turn: int = 8
async def load_production_servers(cfg: ProductionMcpConfig) -> list:
catalog = await fetch_registry(cfg.registry_url)
sessions = []
for entry in catalog:
if entry.version != cfg.server_pins.get(entry.name):
raise ConfigError(f"version pin mismatch: {entry.name}")
session = await connect_sse(entry.url, vault_token(cfg.oauth_vault_path, entry.name))
tools = filter_tools(entry.name, await session.list_tools(), cfg.allowed_tools[entry.name])
sessions.append((session, tools))
return sessions
@Scheduled(fixedRate = 60_000)
public void healthCheckMcpServers() {
for (McpServerRegistration reg : registry.active()) {
HealthStatus status = gateway.ping(reg);
metrics.counter("mcp.server.health", "name", reg.name()).increment(status.isUp() ? 1 : 0);
if (!status.isUp()) notificationService.alert(reg);
}
}
Remote MCP adds network hop (10–80 ms) per tool call. Batch read-only operations in single tools where possible; avoid chatty micro-tool designs.
Platform teams expose one internal MCP gateway URL; product teams register servers via GitOps PR to the registry repo—mirroring internal API catalog patterns.
Latency budget per MCP tool call
| Transport | Connect | tools/list | tools/call p95 |
|---|---|---|---|
| stdio (warm process) | 0 ms (pooled) | 5–20 ms | 20–200 ms |
| SSE remote | 30–100 ms | 40–120 ms | 80–500 ms |
Budget MCP like external APIs: set per-tool timeouts, circuit-break after N failures, and return structured {"error":"server_unavailable"} to the model instead of hanging the agent loop.
Mapping MCP to Anthropic and Bedrock
OpenAI uses tools with function schemas. Anthropic uses tool_use blocks with input objects. Bedrock Converse API unifies tool config across model vendors. Your schema bridge should emit provider-specific payloads from one canonical MCP descriptor list—do not fork business logic per provider.
def to_provider_tools(mcp_tools: list[dict], provider: str, prefix: str) -> list[dict]:
if provider == "openai":
return [mcp_tool_to_openai(t, prefix) for t in mcp_tools]
if provider == "anthropic":
return [{"name": f"{prefix}__{t['name']}", "description": t["description"],
"input_schema": t["input_schema"]} for t in mcp_tools]
raise ValueError(f"unsupported provider: {provider}")
public List<ToolDefinition> toProviderTools(
List<McpToolDescriptor> tools, Provider provider, String prefix) {
return tools.stream().map(t -> switch (provider) {
case OPENAI -> openAiMapper.map(t, prefix);
case ANTHROPIC -> anthropicMapper.map(t, prefix);
case BEDROCK -> bedrockMapper.map(t, prefix);
}).toList();
}
Related Track 4 guides
- Agents explained — tool loops and when agents beat chains
- Tool calling — function schemas before MCP wiring
- Multi-agent orchestration — supervisors calling MCP specialists
- Agentic RAG — retrieval tools exposed via MCP servers
Contrast MCP with OpenAPI: OpenAPI describes HTTP APIs for humans and codegen; MCP describes model-invokable capabilities with discovery and bidirectional sampling. Both can coexist—MCP server wraps internal REST.
Local development checklist
- Run mcp_inspect.py to list tools without invoking the LLM
- Snapshot golden tools/list JSON in unit tests
- Test allowlist drops: ensure removed tools never appear in OpenAI payload
- Simulate server crash mid-session; host should reconnect or degrade gracefully
- Record one full trace (initialize → list → call) for onboarding docs
Spec resources
The MCP specification lives at modelcontextprotocol.io. SDKs for Python and TypeScript track spec versions—when Anthropic or the steering committee publishes transport updates, upgrade host and server SDKs in lockstep. Internal wrappers should not reimplement JSON-RPC; use official clients to avoid subtle protocol bugs.