MCP Server Token Bloat: 55,000 Tokens Wasted Before Your Agent Runs

You connect three MCP servers to Claude Code on Monday morning. Before you type a single character, the agent has already burned through tokens — sometimes more than 50,000 of them. Tool definitions, schemas, descriptions, and parameter docs get injected into the system prompt every turn. A developer who ran the math measured 55,000 tokens of overhead per session, just from MCP server registration.

That number changes how you should think about MCP. It is not “free middleware.” Every server you add competes with your code, your conversation history, and your model’s working memory for the same context window.

The 55K Token Tax No One Talks About

When an MCP server connects to Claude Code, Cursor, or any MCP-compatible client, the client serializes every tool the server exposes — name, description, JSON schema for parameters, and often verbose example text — into the model’s system prompt. This payload re-enters context on every single turn.

The dev.to author who measured this found that a stack with GitHub, Linear, Slack, Postgres, and a few smaller servers passed 55,000 tokens before any user message. Claude Sonnet 4.6 ships with a 200K context window. That means roughly 27% of your usable context is gone before you ask a question.

On Opus 4.7 with 1M context the percentage shrinks, but the cost does not. Cached input tokens still bill at a fraction of regular input — and on every turn that fraction adds up.

Three concrete impacts:

API cost per turn rises. At cached input rates around $0.30 per million tokens, 55K tokens of overhead bills you roughly $0.02 per turn just to remind the model which tools exist. Across a 100-turn session that is $2 of pure tooling tax.
Response latency increases. Larger prompts take longer to process. First-token latency on a 60K-token prompt is measurably worse than on a 5K prompt, even with caching.
Agent reliability drops. Models given more tools than they need confuse parameters, pick wrong tools, and hallucinate arguments. The “lost in the middle” effect is real for tool selection.

Where the Tokens Actually Go

Run a token accounting on a typical MCP server registration and you find three categories of bloat:

Tool descriptions written for humans. Most MCP servers ship descriptions like “Search for issues using a query string. Supports filters for status, assignee, priority, and project. Returns up to 50 results sorted by relevance. Use this when the user asks about specific issues or wants to find issues matching criteria.” That single description is ~200 tokens. Multiply by 30+ tools and you have 6,000 tokens before any schema.

JSON schemas with verbose property docs. Each parameter gets a description field, often duplicated across tools. A Linear MCP server exposing 20 tools with 5 parameters each, where each parameter carries a 30-token description, runs 3,000 tokens before nesting and enum values.

Example text and usage hints. Some servers append example calls like search_issues(query=‘bug’, status=‘open’) to every tool. Helpful for the model, but it triples the cost per tool.

The measurement broke down a real stack: GitHub MCP at ~12K tokens, Linear at ~9K, Slack at ~7K, a custom Postgres server at ~14K, plus filesystem and memory servers around ~5K each. Total: ~55K before the first user prompt.

How to Cut the Bloat

You do not have to disconnect every server. You have to be deliberate about which tools enter context.

Audit what you actually use. Most developers connect a server, use three of its tools regularly, and forget the rest exist. Claude Code’s /mcp command lists every tool; check it against your last week of sessions. If you have not called a tool, it should not be loaded.

Use scoped server profiles. Claude Code and Cursor both support per-project MCP config. Your blog repo does not need the Postgres server. Your data pipeline repo does not need the Notion server. Configure per-project rather than globally.

Prefer deferred tool loading where supported. Some clients now support lazy tool registration where only tool names load up front and full schemas fetch on demand. If your client supports it, enable it — token cost can drop 60–80%.

Write tighter tool descriptions if you author MCP servers. A description should tell the model when to use the tool in 1-2 sentences. Schema doc fields should be short noun phrases, not tutorials.

Measure before and after. Use your API provider’s token usage logs. Anthropic’s console shows cached vs uncached input per request. Set a budget — say, 10K tokens of overhead max — and trim until you hit it.

Cursor

Cursor's MCP integration supports per-project tool scoping, which makes pruning bloated servers faster than editing global Claude Code config.

Free tier, Pro $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

The deeper lesson: MCP is an abstraction with a token-shaped cost that nobody puts on the invoice. You pay for it in latency, accuracy, and dollars per turn. Treat each MCP connection like a dependency in package.json — useful when needed, dead weight when not.

FAQ

Does MCP token overhead apply to all clients or just Claude Code?

Every MCP-compatible client (Claude Code, Cursor, Continue, Cline) serializes tool definitions into the model's prompt. The exact format varies but the cost pattern is identical — descriptions and schemas re-enter context every turn.

Does prompt caching make this irrelevant?

Caching reduces the per-turn dollar cost substantially for repeated content, but it does not free up context window space. You still lose tokens that could hold your code, conversation history, or retrieval results.

Is there a way to dynamically load tools mid-session?

The MCP spec does not currently mandate it, but some clients are experimenting with on-demand tool discovery. As of mid-2026 this is not standard, so the practical answer is to scope servers per project and audit unused tools.

MCP Server Token Bloat: 55,000 Tokens Wasted Before Your Agent Runs

The 55K Token Tax No One Talks About

Where the Tokens Actually Go

How to Cut the Bloat

Cursor

FAQ

Aider vs Continue.dev: Terminal-First vs Editor-First AI Coding in 2026

AI Code Review Tools Compared: CodeRabbit, Greptile, and Diamond in 2026

Using Claude Code Subagents for Parallel Refactoring: A Hands-On Workflow

Cline vs Roo Code: Comparing Open-Source Agentic Coding Extensions in 2026

How to Build a Skills Library for Your AI Engineering Team

Get the best tools, weekly