How Claude Code’s New MCP Tool Search Slashes Context Bloat and Supercharges AI Agents

Anthropic has rolled out a major update to Claude Code that directly targets one of the biggest pain points for developers building tool-rich AI agents with the Model Context Protocol (MCP): context bloat from tool definitions. The new feature, MCP Tool Search, introduces a form of “lazy loading” for tools that dramatically cuts token usage while improving accuracy on tool-use tasks.

For teams that have been bumping into context window limits as they wire Claude into growing fleets of MCP servers, this update materially changes how you can design—and scale—your agent stacks.

The problem: context bloat in MCP-based agents

MCP, released by Anthropic in late 2024 as an open standard, was designed to standardize how AI models and agents connect to external tools and data sources—from GitHub repos to local file systems and Docker environments. Claude Code sits on top of this, acting as an agentic programming harness that can call those tools on demand.

The catch: until now, Claude Code typically had to load the full “instruction manual” for every tool exposed by every MCP server in a session. That meant every tool description, parameter schema, and usage note was poured into the model’s context before any real work started.

As the MCP ecosystem grew, this became a serious tax on the system. Thariq Shihipar, member of the technical staff at Anthropic, noted that MCP servers in the wild commonly expose 50+ tools each, and users were documenting setups with 7+ servers that consumed more than 67,000 tokens just for tool definitions. In a 200,000-token context window, that’s roughly a third of your budget gone before the user types a prompt.

AI newsletter author Aakash Gupta highlighted an even more extreme example: a single Docker MCP server with 135 tools consuming around 125,000 tokens in definitions alone. In practice, developers were forced into a trade-off:

Limit MCP servers to a small core subset of tools, or
Accept that a large share of the context window would be taken up by tool metadata instead of user code, logs, or relevant documents.

The end result was a “startup tax” on agents that made rich tool ecosystems feel expensive, fragile, and difficult to scale.

How MCP Tool Search works under the hood

The new feature—described by Shihipar as “one of our most-requested features” on the Claude Code GitHub tracker—changes the loading strategy rather than the tools themselves.

Claude Code now monitors how much of the context window would be consumed by tool definitions. According to Anthropic’s release notes, when the tool descriptions cross a threshold of about 10% of the available context, the system stops preloading full definitions and instead builds a lightweight search index over them.

At that point, the architecture flips:

The full text of every tool definition is no longer injected into the prompt.
Instead, the index is kept around, and when a user asks for a specific action (“deploy this container”, “sync this repo”, etc.), Claude Code queries the index for relevant tools.
Only the definitions for the tools that match the query are then pulled into the live context.

Gupta analyzed Anthropic’s internal tests and reported that this shift cut token usage for a large tool set from roughly 134,000 tokens down to about 5,000—a reduction of around 85% while still preserving access to the same tools.

This also changes how MCP server authors should think about their configurations. Shihipar pointed out that the server instructions field in the MCP definition—previously more of a “nice-to-have” description—now plays a critical role. Those instructions effectively serve as metadata that helps Claude determine when and how to search for tools, similar to how a skill description informs an assistant when that skill is relevant.

What this means for token budgets and context windows

For practitioners, the most immediate win is straightforward: you recover a large portion of your context window that used to be spent on static tool docs. Instead of burning tens of thousands of tokens on descriptions for tools the agent never touches in a session, that capacity can now be used for:

Longer source files or diffs
Extended logs or traces
Multi-step reasoning over project state
Richer conversation history

In previous setups with heavy MCP configurations, it was not uncommon to see 30–60% of the window consumed at startup. Gupta described this as a “brutal tradeoff”: either pare back servers and tools, or accept that “half your context budget disappears before you start working.” With Tool Search, the cost becomes proportional to usage rather than potential capability.

Because the model now pulls in only those tool definitions that are actually required for a given task, you can attach far more tools to a single agent without paying a linear cost in tokens up front. For teams that have been manually pruning or splitting MCP servers just to keep context usage tolerable, this update loosens those constraints considerably.

‘Lazy loading’ and why it can improve accuracy

The token savings are only half the story. Anthropic and community benchmarks suggest that Tool Search also improves the quality of tool selection and instruction following.

Language models are sensitive to irrelevant information in their context. When you stuff the prompt with thousands of lines of tool definitions, subtle differences between similar tools can be harder for the model to track, especially when names or parameters are close—think notification-send-user vs. notification-send-channel. This “needle in a haystack” problem can degrade reasoning and increase the odds of the wrong tool being invoked.

By loading only a small, targeted subset of tools for each task, Tool Search effectively narrows the search space. Boris Cherny, Head of Claude Code, summarized the impact on X: every Claude Code user now gets “way more context, better instruction following, and the ability to plug in even more tools.”

Community-shared benchmarks align with that claim. On internal MCP evaluations:

Claude 3.5 Opus 4’s accuracy reportedly increased from 49% to 74% when Tool Search was enabled.
Opus 4.5 saw an accuracy jump from 79.5% to 88.1% on the same kind of tasks.

These numbers indicate that reducing noise from unused tools doesn’t just save memory—it makes the model more likely to pick and use the right tool at the right time.

Designing MCP servers and tool sets in the new model

This change alters how you should think about constructing and organizing MCP servers.

Previously, best practices tended to emphasize keeping tool sets small and carefully curated per agent, because every extra tool had a fixed cost in tokens. With Tool Search in place, that cost is amortized: tools you rarely use don’t weigh down every session.

Concretely, this implies a few shifts for developers:

Richer, more descriptive server instructions: Since server instructions help Claude decide when to query tools, treating them as “skill descriptions” becomes important. Clear, concise descriptions of what the server is for, and what kinds of tasks it supports, should help the model trigger the right searches.
More modular but larger tool catalogs: You no longer have to fear large catalogs purely for context reasons. It becomes more feasible to expose many related tools—e.g., a full suite of Docker, CI/CD, or database maintenance operations—without worrying that every one will be loaded into the prompt.
Less manual pruning for token reasons: If you previously removed tools or split servers to stay under context limits, you can revisit those choices with Tool Search in mind and focus instead on semantic and operational boundaries (e.g., ownership, permissions, or environment separation).

Anthropic also recommends that developers building MCP clients implement the ToolSearchTool to fully support this dynamic loading behavior. That ensures clients can participate in the same on-demand tool selection pattern that Claude Code now uses.

Parallels to modern software engineering patterns

The architectural shift mirrors a familiar evolution in traditional software infrastructure. Gupta compared the previous approach to “2020-era static imports,” where everything is loaded at startup whether it’s needed or not. Most modern systems, from IDEs to web apps, have moved away from that pattern.

In today’s development environments:

VS Code doesn’t load every extension fully at startup.
JetBrains IDEs don’t eagerly inject every plugin’s documentation into memory.

Instead, capabilities are typically activated on demand, and related assets are loaded only when you engage a feature. Anthropic’s adoption of “lazy loading” for tools through MCP Tool Search reflects the same mindset: treat AI agents not as simple chatbots with a few integrations, but as full software platforms with architectural constraints.

In that light, the update is less a new feature and more a sign that the AI tooling stack is maturing. Efficiency, rather than raw capability, is becoming the main engineering challenge.

Implications for tool-rich AI agents and the MCP ecosystem

From an end user’s perspective, this rollout is largely invisible. Claude Code sessions simply feel like they have more memory and better focus, with fewer inexplicable tool misfires. For the ecosystem of MCP-based tooling and agent builders, though, the implications are broader.

Before Tool Search, there was a soft cap on how many tools an agent could practically expose without undermining its own effectiveness. Too many tools meant too much context bloat; too aggressive pruning meant reduced capability. Now, that ceiling is effectively raised, if not removed in many real-world scenarios.

Agents can, in principle, connect to thousands of tools—database connectors, deployment scripts, API clients, filesystem operations—without paying for all of them upfront in tokens. You only incur the cost when those tools are actually needed and pulled into the context.

Gupta framed this as a shift in the “context economy” from scarcity to access: the question becomes less “what can I afford to load?” and more “what capabilities should I make available, knowing they’ll only cost me when used?” That framing opens the door to much more ambitious, tool-rich agents that can act as unified fronts over complex infrastructure.

The update is rolling out immediately for Claude Code users. As more MCP clients adopt Tool Search semantics, it’s likely that the practical definition of a “tool-rich” agent will expand—from a handful of tightly pruned tools to broad catalogs that are activated selectively, without overwhelming the model or the context window.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.