MCP Server Token Bloat: 55,000 Tokens Wasted Before Your Agent Runs
Connecting MCP servers to Claude Code or Cursor silently injects 55K+ tokens of tool definitions into every turn. Here's the real cost — and how to cut it.
You connect three MCP servers to Claude Code on Monday morning. Before you type a single character, the agent has already burned through tokens — sometimes more than 50,000 of them. Tool definitions, schemas, descriptions, and parameter docs get injected into the system prompt every turn. A developer who ran the math measured 55,000 tokens of overhead per session, just from MCP server registration.
That number changes how you should think about MCP. It is not “free middleware.” Every server you add competes with your code, your conversation history, and your model’s working memory for the same context window.
The 55K Token Tax No One Talks About
When an MCP server connects to Claude Code, Cursor, or any MCP-compatible client, the client serializes every tool the server exposes — name, description, JSON schema for parameters, and often verbose example text — into the model’s system prompt. This payload re-enters context on every single turn.
The dev.to author who measured this found that a stack with GitHub, Linear, Slack, Postgres, and a few smaller servers passed 55,000 tokens before any user message. Claude Sonnet 4.6 ships with a 200K context window. That means roughly 27% of your usable context is gone before you ask a question.
On Opus 4.7 with 1M context the percentage shrinks, but the cost does not. Cached input tokens still bill at a fraction of regular input — and on every turn that fraction adds up.
Three concrete impacts:
- API cost per turn rises. At cached input rates around $0.30 per million tokens, 55K tokens of overhead bills you roughly $0.02 per turn just to remind the model which tools exist. Across a 100-turn session that is $2 of pure tooling tax.
- Response latency increases. Larger prompts take longer to process. First-token latency on a 60K-token prompt is measurably worse than on a 5K prompt, even with caching.
- Agent reliability drops. Models given more tools than they need confuse parameters, pick wrong tools, and hallucinate arguments. The “lost in the middle” effect is real for tool selection.
Where the Tokens Actually Go
Run a token accounting on a typical MCP server registration and you find three categories of bloat:
Tool descriptions written for humans. Most MCP servers ship descriptions like “Search for issues using a query string. Supports filters for status, assignee, priority, and project. Returns up to 50 results sorted by relevance. Use this when the user asks about specific issues or wants to find issues matching criteria.” That single description is ~200 tokens. Multiply by 30+ tools and you have 6,000 tokens before any schema.
JSON schemas with verbose property docs. Each parameter gets a description field, often duplicated across tools. A Linear MCP server exposing 20 tools with 5 parameters each, where each parameter carries a 30-token description, runs 3,000 tokens before nesting and enum values.
Example text and usage hints. Some servers append example calls like search_issues(query=‘bug’, status=‘open’) to every tool. Helpful for the model, but it triples the cost per tool.
The measurement broke down a real stack: GitHub MCP at ~12K tokens, Linear at ~9K, Slack at ~7K, a custom Postgres server at ~14K, plus filesystem and memory servers around ~5K each. Total: ~55K before the first user prompt.
How to Cut the Bloat
You do not have to disconnect every server. You have to be deliberate about which tools enter context.
Audit what you actually use. Most developers connect a server, use three of its tools regularly, and forget the rest exist. Claude Code’s /mcp command lists every tool; check it against your last week of sessions. If you have not called a tool, it should not be loaded.
Use scoped server profiles. Claude Code and Cursor both support per-project MCP config. Your blog repo does not need the Postgres server. Your data pipeline repo does not need the Notion server. Configure per-project rather than globally.
Prefer deferred tool loading where supported. Some clients now support lazy tool registration where only tool names load up front and full schemas fetch on demand. If your client supports it, enable it — token cost can drop 60–80%.
Write tighter tool descriptions if you author MCP servers. A description should tell the model when to use the tool in 1-2 sentences. Schema doc fields should be short noun phrases, not tutorials.
Measure before and after. Use your API provider’s token usage logs. Anthropic’s console shows cached vs uncached input per request. Set a budget — say, 10K tokens of overhead max — and trim until you hit it.
Cursor
Cursor's MCP integration supports per-project tool scoping, which makes pruning bloated servers faster than editing global Claude Code config.
Free tier, Pro $20/mo
Affiliate link · We earn a commission at no cost to you.
The deeper lesson: MCP is an abstraction with a token-shaped cost that nobody puts on the invoice. You pay for it in latency, accuracy, and dollars per turn. Treat each MCP connection like a dependency in package.json — useful when needed, dead weight when not.
FAQ
Does MCP token overhead apply to all clients or just Claude Code? +
Does prompt caching make this irrelevant? +
Is there a way to dynamically load tools mid-session? +
Related reading
2026-05-17
Hermes Memory Installer Review: One-Command Persistent Memory for Local AI Agents
Nous Research's Hermes Memory Installer adds local persistent memory to AI agents with one shell command. We compare its file-based approach to Mem0 and Letta.
2026-05-17
Tokenyst Review: Track Claude Code API Costs Before the Bill Lands
A practical look at Tokenyst, an open-source local monitor that tracks Claude Code API token usage in real time and alerts you before runaway agent loops turn into surprise Anthropic bills.
2026-05-17
Anthropic Managed Agents Add 'Dreaming': Background Outcomes Without Your Own Loop
Anthropic's Managed Agents platform adds 'dreaming' — background agent execution that explores outcomes on Anthropic's infrastructure. How the new capability changes the build-vs-buy math for teams shipping on Claude.
2026-05-17
Anthropic Taps SpaceX's 220K-GPU Colossus 1 to Fix Claude Rate Limits
Anthropic reportedly secured access to SpaceX's 220,000-GPU Colossus 1 cluster to relieve Claude API capacity pressure. Here's what changes for the 529 errors and tight rate limits hitting your coding agents.
2026-05-17
Claude in Microsoft 365: Outlook Joins, Word/Excel/PowerPoint Hit GA
Anthropic is rolling Claude into Microsoft 365: Outlook gains support and Word, Excel, and PowerPoint integrations leave preview for general availability. Here's what changes for developers and which workflows actually benefit.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.