MCP Server Token Bloat: 55,000 Tokens Wasted Before Your Agent Runs
Connecting MCP servers to Claude Code or Cursor silently injects 55K+ tokens of tool definitions into every turn. Here's the real cost — and how to cut it.
You connect three MCP servers to Claude Code on Monday morning. Before you type a single character, the agent has already burned through tokens — sometimes more than 50,000 of them. Tool definitions, schemas, descriptions, and parameter docs get injected into the system prompt every turn. A developer who ran the math measured 55,000 tokens of overhead per session, just from MCP server registration.
That number changes how you should think about MCP. It is not “free middleware.” Every server you add competes with your code, your conversation history, and your model’s working memory for the same context window.
The 55K Token Tax No One Talks About
When an MCP server connects to Claude Code, Cursor, or any MCP-compatible client, the client serializes every tool the server exposes — name, description, JSON schema for parameters, and often verbose example text — into the model’s system prompt. This payload re-enters context on every single turn.
The dev.to author who measured this found that a stack with GitHub, Linear, Slack, Postgres, and a few smaller servers passed 55,000 tokens before any user message. Claude Sonnet 4.6 ships with a 200K context window. That means roughly 27% of your usable context is gone before you ask a question.
On Opus 4.7 with 1M context the percentage shrinks, but the cost does not. Cached input tokens still bill at a fraction of regular input — and on every turn that fraction adds up.
Three concrete impacts:
- API cost per turn rises. At cached input rates around $0.30 per million tokens, 55K tokens of overhead bills you roughly $0.02 per turn just to remind the model which tools exist. Across a 100-turn session that is $2 of pure tooling tax.
- Response latency increases. Larger prompts take longer to process. First-token latency on a 60K-token prompt is measurably worse than on a 5K prompt, even with caching.
- Agent reliability drops. Models given more tools than they need confuse parameters, pick wrong tools, and hallucinate arguments. The “lost in the middle” effect is real for tool selection.
Where the Tokens Actually Go
Run a token accounting on a typical MCP server registration and you find three categories of bloat:
Tool descriptions written for humans. Most MCP servers ship descriptions like “Search for issues using a query string. Supports filters for status, assignee, priority, and project. Returns up to 50 results sorted by relevance. Use this when the user asks about specific issues or wants to find issues matching criteria.” That single description is ~200 tokens. Multiply by 30+ tools and you have 6,000 tokens before any schema.
JSON schemas with verbose property docs. Each parameter gets a description field, often duplicated across tools. A Linear MCP server exposing 20 tools with 5 parameters each, where each parameter carries a 30-token description, runs 3,000 tokens before nesting and enum values.
Example text and usage hints. Some servers append example calls like search_issues(query=‘bug’, status=‘open’) to every tool. Helpful for the model, but it triples the cost per tool.
The measurement broke down a real stack: GitHub MCP at ~12K tokens, Linear at ~9K, Slack at ~7K, a custom Postgres server at ~14K, plus filesystem and memory servers around ~5K each. Total: ~55K before the first user prompt.
How to Cut the Bloat
You do not have to disconnect every server. You have to be deliberate about which tools enter context.
Audit what you actually use. Most developers connect a server, use three of its tools regularly, and forget the rest exist. Claude Code’s /mcp command lists every tool; check it against your last week of sessions. If you have not called a tool, it should not be loaded.
Use scoped server profiles. Claude Code and Cursor both support per-project MCP config. Your blog repo does not need the Postgres server. Your data pipeline repo does not need the Notion server. Configure per-project rather than globally.
Prefer deferred tool loading where supported. Some clients now support lazy tool registration where only tool names load up front and full schemas fetch on demand. If your client supports it, enable it — token cost can drop 60–80%.
Write tighter tool descriptions if you author MCP servers. A description should tell the model when to use the tool in 1-2 sentences. Schema doc fields should be short noun phrases, not tutorials.
Measure before and after. Use your API provider’s token usage logs. Anthropic’s console shows cached vs uncached input per request. Set a budget — say, 10K tokens of overhead max — and trim until you hit it.
Cursor
Cursor's MCP integration supports per-project tool scoping, which makes pruning bloated servers faster than editing global Claude Code config.
Free tier, Pro $20/mo
Affiliate link · We earn a commission at no cost to you.
The deeper lesson: MCP is an abstraction with a token-shaped cost that nobody puts on the invoice. You pay for it in latency, accuracy, and dollars per turn. Treat each MCP connection like a dependency in package.json — useful when needed, dead weight when not.
FAQ
Does MCP token overhead apply to all clients or just Claude Code? +
Does prompt caching make this irrelevant? +
Is there a way to dynamically load tools mid-session? +
Related reading
2026-05-26
Orthrus: Parallel Token Generation That Doesn't Change Your Model's Output
Orthrus injects diffusion attention into each layer of a frozen autoregressive Transformer to generate 32 tokens in parallel — without altering the base model's output distribution.
2026-05-26
NVIDIA Warp Review: GPU-Accelerated Python for Simulation, Robotics, and Differentiable ML
NVIDIA Warp compiles Python functions to CUDA kernels for differentiable physics and robotics. We benchmarked it against JAX and Taichi to figure out when it earns a spot in your stack.
2026-05-26
OpenAI Daybreak vs Anthropic Glasswing: Convergent Bets on LLM Security Tooling
OpenAI's Daybreak (GPT-5.5 + Codex Security) and Anthropic's Glasswing shipped near-identical AppSec products the same week. What the convergence means and how to pick.
2026-05-26
Macchiato Day 2: Live Token Metrics and Parallel AI Terminals Reviewed
Macchiato's day-2 build adds a live token/cost sidebar and keyboard shortcuts for swapping between Claude Code and OpenCode in one terminal. Here's what shipped and what it means.
2026-05-26
Macchiato Day 2: Live Token Metrics and Parallel Terminals for Claude Code and OpenCode
Macchiato Day 2 adds a 2-4 pane terminal grid, live token and cost meters, and configurable spend ceilings for Claude Code and OpenCode sessions. Here is what it actually does and who should install it.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.