Claude Context Mode

Context Mode is an MCP server that sits between Claude Code and tool outputs to compress raw data before it enters your context window. Inspired by Cloudflare’s Code Mode (which compresses tool definitions), Context Mode handles the opposite direction: tool outputs. A 315 KB response becomes 5.4 KB. That’s a 98% reduction.

Every MCP tool call dumps raw data into your 200K context window. A Playwright snapshot costs 56 KB. Twenty GitHub issues cost 59 KB. One access log costs 45 KB. After 30 minutes of work, 40% of your context is gone. Context Mode solves this by processing tool outputs in sandboxes so only summaries reach the model.

Features

  • 🚀 Batch Execution: Run multiple commands and search multiple queries in a single tool call.
  • 💻 Multi-Language Code Execution: Execute code in 11 languages (JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R, Elixir). Only stdout enters context.
  • 📁 File Processing: Process files in isolated sandboxes. Raw content never leaves the sandbox.
  • 🔍 Full-Text Search Indexing: Chunk markdown content into SQLite FTS5 with BM25 ranking.
  • 🌐 Fetch and Index: Fetch URLs, convert HTML to markdown, and index the content automatically.
  • 🧠 Fuzzy Search with Three-Layer Fallback: Handle typos with Porter stemming, trigram substring matching, and Levenshtein distance correction.
  • 📊 Real-Time Session Statistics: Track context consumption per tool and total savings.
  • 🔄 Automatic Subagent Routing: PreToolUse hook injects routing instructions so subagents use batch_execute by default.
  • 🛡️ Process Isolation: Each execution runs in its own subprocess with separate memory and state.

How to Use It

Installation

/plugin marketplace add mksglu/claude-context-mode
/plugin install context-mode@claude-context-mode

Restart Claude Code after installation. This installs:

  • The MCP server.
  • A PreToolUse hook that automatically routes tool outputs through the sandbox.
  • Slash commands for diagnostics and upgrades.

Slash Commands

CommandDescription
/context-mode:statsShow context savings for the current session (per-tool breakdown, tokens consumed, savings ratio)
/context-mode:doctorRun diagnostics (checks runtimes, hooks, FTS5, plugin registration, versions)
/context-mode:upgradePull latest from GitHub, rebuild, migrate cache, fix hooks

Alternative Installation Methods

MCP-only installation (no hooks or slash commands):

claude mcp add context-mode -- npx -y context-mode

Local development:

claude --plugin-dir ./path/to/context-mode

Tools Reference

ToolDescriptionParametersContext Savings
batch_executeRun multiple commands or search multiple queries in one callcommands: string[] (optional), queries: string[] (optional), intent: string (optional)986 KB → 62 KB
executeRun code in 11 languageslanguage: string, code: string, intent: string (optional)56 KB → 299 B
execute_fileProcess files in sandboxfilepath: string, language: string, intent: string (optional)45 KB → 155 B
indexChunk markdown into FTS5 with BM25 rankingcontent: string, source: string (identifier for the content)60 KB → 40 B
searchQuery indexed content with multiple queriesqueries: string[], limit: number (optional, default 2 per query)On-demand retrieval
fetch_and_indexFetch URL, convert to markdown, indexurl: string60 KB → 40 B
statsTrack context consumption in real-timeNoneN/A (diagnostic)

How the Sandbox Works

Each execute call spawns an isolated subprocess with its own process boundary. Scripts cannot access each other’s memory or state. The subprocess runs your code, captures stdout, and only that stdout enters the conversation context. Raw data (log files, API responses, snapshots) never leaves the sandbox.

Eleven language runtimes are available:

  • JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R, Elixir

Bun is auto-detected and provides 3-5x faster JavaScript/TypeScript execution when available.

Authenticated CLIs work through credential passthrough. Tools like gh, aws, gcloud, kubectl, and docker inherit environment variables and config paths without exposing credentials to the conversation.

Intent-Driven Filtering

When output exceeds 5 KB and you provide an intent parameter, Context Mode switches to intent-driven filtering. It indexes the full output into the knowledge base, searches for sections matching your intent, and returns only relevant matches with a vocabulary of searchable terms for follow-up queries.

How the Knowledge Base Works

The index tool chunks markdown content by headings while keeping code blocks intact. It stores chunks in a SQLite FTS5 (Full-Text Search 5) virtual table.

Search uses BM25 ranking, a probabilistic relevance algorithm that scores documents based on:

  • Term frequency (how often terms appear)
  • Inverse document frequency (how rare terms are across documents)
  • Document length normalization

Porter stemming applies at index time. Words like “running”, “runs”, and “ran” match the same stem.

The search tool returns relevant content snippets focused around matching query terms. You get the actual indexed content with smart extraction around what you’re looking for, not full documents or approximations.

fetch_and_index extends this to URLs: fetch the URL, convert HTML to markdown, chunk the content, and index it. The raw page never enters context.

Fuzzy Search Implementation

Search uses a three-layer fallback system:

  • Layer 1 — Porter stemming: Standard FTS5 MATCH with porter tokenizer. “caching” matches “cached”, “caches”, “cache”.
  • Layer 2 — Trigram substring: FTS5 trigram tokenizer matches partial strings. “useEff” finds “useEffect”. “authenticat” finds “authentication”.
  • Layer 3 — Fuzzy correction: Levenshtein distance corrects typos before re-searching. “kuberntes” becomes “kubernetes”. “autentication” becomes “authentication”.

The searchWithFallback method cascades through all three layers and annotates results with matchLayer so you know which layer resolved the query.

Smart Snippets

Instead of returning the first N characters (which might miss important content), Context Mode finds where your query terms appear and returns windows around those matches. If you search for “authentication JWT token”, you get the paragraphs where those terms actually appear.

Progressive Search Throttling

The search tool prevents context flooding from excessive individual calls:

  • Calls 1-3: Normal results (2 per query)
  • Calls 4-8: Reduced results (1 per query) + warning
  • Calls 9+: Blocked (redirects to batch_execute)

This encourages batching queries via search(queries: ["q1", "q2", "q3"]) or batch_execute instead of making dozens of individual calls.

Session Statistics Example

Run /context-mode:stats to see real-time context savings:

MetricValue
Session1.4 min
Tool calls1
Total data processed9.6 MB
Kept in sandbox9.6 MB
Entered context0.3 KB
Tokens consumed~82
Context savings24,576.0x (99% reduction)

Without Context Mode, 9.6 MB of raw tool output would flood your context window. Instead, 99% stayed in the sandbox, saving approximately 2.4 million tokens.

Subagent Routing

When installed as a plugin, Context Mode includes a PreToolUse hook that automatically injects routing instructions into subagent (Task tool) prompts. Subagents learn to use batch_execute as their primary tool and search(queries: [...]) for follow-ups without manual configuration.

Bash subagents are automatically upgraded to general-purpose so they can access MCP tools. Without this upgrade, a subagent_type: "Bash" agent only has the Bash tool and cannot call batch_execute or search.

Example Prompts

These prompts work out of the box. Run /context-mode:stats after each to see the savings.

Deep repository research (5 calls, 62 KB context vs 986 KB raw):

Research https://github.com/modelcontextprotocol/servers — architecture, tech stack,
top contributors, open issues, and recent activity.

Git history analysis (1 call, 5.6 KB context):

Clone https://github.com/facebook/react and analyze the last 500 commits:
top contributors, commit frequency by month, and most changed files.

Web scraping (1 call, 3.2 KB context):

Fetch the Hacker News front page, extract all posts with titles, scores,
and domains. Group by domain.

Large JSON API (7.5 MB raw → 0.9 KB context):

Create a local server that returns a 7.5 MB JSON with 20,000 records and a secret
hidden at index 13000. Fetch the endpoint, find the hidden record, and show me
exactly what's in it.

Documentation search (2 calls, 1.8 KB context):

Fetch the React useEffect docs, index them, and find the cleanup pattern
with code examples.

Performance Numbers

Measured across real-world scenarios:

ScenarioRaw SizeContext SizeSavings
Playwright snapshot56.2 KB299 B99%
GitHub Issues (20)58.9 KB1.1 KB98%
Access log (500 requests)45.1 KB155 B100%
React docs chunk5.9 KB261 B96%
Analytics CSV (500 rows)85.5 KB222 B100%
Git log (153 commits)11.6 KB107 B99%
Test output (30 suites)6.0 KB337 B95%
Repo research (subagent)986 KB62 KB94%

Over a full session: 315 KB of raw output becomes 5.4 KB. Session time before slowdown goes from approximately 30 minutes to approximately 3 hours. Context remaining after 45 minutes: 99% instead of 60%.

FAQs

Q: What are the system requirements for Context Mode?
A: Node.js 18 or higher is required. Claude Code with MCP support is necessary for full functionality. Bun is optional but recommended for 3-5x faster JavaScript/TypeScript execution when auto-detected.

Q: How does the sandbox isolate code execution?
A: Each execute call spawns an isolated subprocess with its own process boundary, memory space, and state. Scripts cannot access each other’s data. The subprocess runs your code, captures stdout, and only that stdout enters the conversation context. Raw data never leaves the sandbox.

Q: Can I use authenticated CLI tools like gh or aws with Context Mode?
A: Yes. Authenticated CLIs work through credential passthrough. Tools like gh, aws, gcloud, kubectl, and docker inherit environment variables and config paths from your shell without exposing those credentials to the conversation.

Q: What happens when output exceeds 5 KB and I provide an intent?
A: Context Mode switches to intent-driven filtering. It indexes the full output into the SQLite knowledge base, searches for sections matching your intent, and returns only relevant matches with a vocabulary of searchable terms for follow-up queries.

Q: How does fuzzy search handle typos?
A: Search uses a three-layer fallback system. Layer 1 applies Porter stemming to match word variations. Layer 2 uses trigram substring matching for partial terms. Layer 3 applies Levenshtein distance to correct typos before re-searching. Results are annotated with the match layer that resolved the query.

Q: What is FTS5 and BM25 ranking?
A: FTS5 is SQLite’s full-text search engine. BM25 is a probabilistic relevance algorithm that scores documents based on term frequency, inverse document frequency, and document length normalization. Context Mode uses both with Porter stemming at index time.

Q: How do I batch multiple queries to avoid throttling?
A: Use the batch_execute tool with the queries parameter: batch_execute(queries: ["query1", "query2", "query3"]). You can also batch commands with the commands parameter. This counts as one tool call instead of multiple individual calls.

Latest MCP Servers

Excalidraw

Excalidraw's official MCP server that streams interactive hand-drawn diagrams to Claude, ChatGPT, and VS Code with smooth camera control and fullscreen editing.

Claude Context Mode

This MCP Server compresses tool outputs by 98% using sandboxed execution, full-text search with BM25 ranking, and multi-language support for Claude Code.

Context+

An MCP server provides AST parsing, semantic search, and feature graph tools for large codebases with 99% accuracy.

View More MCP Servers >>

Featured MCP Servers

Excalidraw

Excalidraw's official MCP server that streams interactive hand-drawn diagrams to Claude, ChatGPT, and VS Code with smooth camera control and fullscreen editing.

Claude Context Mode

This MCP Server compresses tool outputs by 98% using sandboxed execution, full-text search with BM25 ranking, and multi-language support for Claude Code.

Context+

An MCP server provides AST parsing, semantic search, and feature graph tools for large codebases with 99% accuracy.

More Featured MCP Servers >>

FAQs

Q: What exactly is the Model Context Protocol (MCP)?

A: MCP is an open standard, like a common language, that lets AI applications (clients) and external data sources or tools (servers) talk to each other. It helps AI models get the context (data, instructions, tools) they need from outside systems to give more accurate and relevant responses. Think of it as a universal adapter for AI connections.

Q: How is MCP different from OpenAI's function calling or plugins?

A: While OpenAI's tools allow models to use specific external functions, MCP is a broader, open standard. It covers not just tool use, but also providing structured data (Resources) and instruction templates (Prompts) as context. Being an open standard means it's not tied to one company's models or platform. OpenAI has even started adopting MCP in its Agents SDK.

Q: Can I use MCP with frameworks like LangChain?

A: Yes, MCP is designed to complement frameworks like LangChain or LlamaIndex. Instead of relying solely on custom connectors within these frameworks, you can use MCP as a standardized bridge to connect to various tools and data sources. There's potential for interoperability, like converting MCP tools into LangChain tools.

Q: Why was MCP created? What problem does it solve?

A: It was created because large language models often lack real-time information and connecting them to external data/tools required custom, complex integrations for each pair. MCP solves this by providing a standard way to connect, reducing development time, complexity, and cost, and enabling better interoperability between different AI models and tools.

Q: Is MCP secure? What are the main risks?

A: Security is a major consideration. While MCP includes principles like user consent and control, risks exist. These include potential server compromises leading to token theft, indirect prompt injection attacks, excessive permissions, context data leakage, session hijacking, and vulnerabilities in server implementations. Implementing robust security measures like OAuth 2.1, TLS, strict permissions, and monitoring is crucial.

Q: Who is behind MCP?

A: MCP was initially developed and open-sourced by Anthropic. However, it's an open standard with active contributions from the community, including companies like Microsoft and VMware Tanzu who maintain official SDKs.

Get the latest & top AI tools sent directly to your email.

Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!