Claude Code Local is a free open-source project that allows you to use Claude Code with local AI models on an Apple Silicon Mac.
You can enter prompts, approve file changes, run shell commands, and work through Claude Code as usual, but a local MLX server sends requests to models such as Gemma, Qwen, or Llama instead of a hosted Claude model.
The stack replaces Anthropic’s cloud endpoint with a lightweight MLX-native server that speaks directly to the Anthropic API.
Claude Code thinks it is talking to the cloud, but every prompt, every file, every tool call stays on localhost.
You can pick from a lineup of local models, such as Gemma 4 31B, Qwen 3.5 122B, DeepSeek V4 Flash, or Llama 3.3 70B, and switch between them using an environment variable.
The same server handles all three model families, translates tool-use formats, and reuses prompt caches so coding sessions feel responsive.
Features
- Runs local models through a single MLX-native Anthropic API server.
- Supports Claude Code’s full tool-use interface across Bash, Read, Edit, Write, Grep, and Glob tools.
- 4 one-click launchers for coding, browser control, hands-free voice, and phone-based control.
- Recovers garbled tool calls from small models with automatic retry and format repair.
- Strips Claude Code’s 10K-token harness prompt in code mode for a 28× token reduction.
- Reuses prompt caches across requests so the system prompt prefills only once per session.
- Passes MCP tool definitions through to the local model and translates responses back into Anthropic format.
- Includes a browser agent that controls a real Brave browser via Chrome DevTools Protocol.
- Provides a full voice pipeline that uses on-device speech recognition and a cloned-voice TTS.
Use Cases
- Repair a failing unit test by tracing the error, editing one function, and running the test command from Claude Code.
- Explain a stack trace or a legacy configuration file without uploading the code to a hosted model endpoint.
- Draft a utility function, shell command, regular expression, or migration script on a 16 GB MacBook.
- Search a local repository for related files before preparing a contained patch.
- Run larger local models on a 64 GB or 96 GB Mac for repeated maintenance tasks, repository searches, and tool-assisted edits.
How to Use Claude Code Local
Standard Apple Silicon Setup
Install the Claude Code CLI, clone the repository, and run the setup script:
npm install -g @anthropic-ai/claude-code
git clone https://github.com/nicedreamzapp/claude-code-local
cd claude-code-local
bash setup.shThe installer checks your unified memory, installs Python 3.12 and MLX dependencies, downloads a starter model, and creates Claude Local.command on your Desktop.
Open Claude Local.command to start the local server and launch Claude Code. The session routes requests through localhost:4000 to the selected model on your Apple Silicon GPU.
Model Paths by Mac Memory
| Unified Memory | Default Model Route | Suitable Work |
|---|---|---|
| Under 32 GB | Qwen 3.5 4B through setup.sh | Short prompts, code explanations, small snippets |
| 16 GB | Qwen 2.5 Coder 14B through Chat or Agentico launchers | Debugging, contained fixes, limited local tool use |
| 32 GB | Gemma 4 12B | Small-to-medium coding tasks |
| 64 GB | Gemma 4 31B | Regular local coding sessions |
| 96 GB or more | Qwen 3.5 122B | Larger repositories and heavier local agent work |
| 128 GB or more | DeepSeek V4 Flash route | Long-context local workflows |
The 32 GB and 64 GB defaults use abliterated Gemma builds. Abliteration changes a model’s refusal behavior; it does not turn the model into a Claude-equivalent coding system.
Mac Base and Pro 16 GB Setup
The 16 GB route uses the included Claude Chat.command and Claude Agentico.command launchers. Both target Qwen 2.5 Coder 14B in 4-bit MLX format, which uses about 7.8 GB of model weights.
Run the standard setup once, then open the launcher that matches the task:
cd claude-code-local
open "launchers/Claude Chat.command"Use Chat mode for code questions, pasted stack traces, snippets, configuration explanations, and planning. It starts Claude Code without tools.
open "launchers/Claude Agentico.command"Use Agentico mode when Claude Code needs to read files, write a focused patch, search a project, or run shell commands. The launcher uses low-effort mode and a 4-bit KV cache to fit the Claude Code prompt and local model into 16 GB of unified memory.
Expect about 10 to 15 tokens per second with Qwen 2.5 Coder 14B on a 16 GB Mac. Keep tool-driven tasks narrow. A focused bug fix or test update fits this setup far better than a multi-file refactor with a long chain of edits and commands.
Local Model Sessions and Connected Tools
A local coding session sends the model request from Claude Code to the MLX server on your Mac. After the repository, dependencies, and model weights are downloaded, the model route works without an internet connection.
Connected tools use their own data paths:
| Tool or Feature | Data Path |
|---|---|
| Local model and local filesystem tools | Mac only |
| GitHub MCP server | GitHub API |
| Web-search MCP server | Search provider API |
| Browser Agent | Local Brave browser and the pages it opens |
| Phone and iMessage workflows | Apple Messages and related companion tools |
The 16 GB Chat and Agentico launchers disable non-essential Claude Code traffic, including telemetry, marketplace auto-install, background tasks, and automatic updates. GitHub, web-search, browser, and remote-access tools remain separate connections.
Alternatives
- Claude Code Proxy: Use Claude Code With Any Free and Local Models
- SmallCode: Fast, Free, Local AI Coding Agent for Small LLMs
- OpenClaude: Multi-Model AI Coding Agent CLI
- Claude Code Commands Cheat Sheet
- Codebase Memory MCP Server: Local Code Intelligence for AI Agents
- The Ultimate Claude Code Resource List
- 7 Best CLI AI Coding Agents
Pros
- Zero cloud dependency.
- No API key or subscription needed.
- Verified offline with
lsof. - Multiple model choices in one server.
- Works with Claude Code’s MCP plugin system.
- 65 tok/s on a fast Mac.
- Includes browser agent and voice mode.
Cons
- Requires an Apple Silicon Mac with large RAM.
- Local models produce weaker reasoning than cloud Claude.
- Large model downloads (18–81 GB).
FAQs
Q: Is Claude Code Local free?
A: Yes. The entire stack is open-source under the MIT license. You pay nothing for model inference.
Q: Does it require an internet connection?
A: No. The server, models, and launchers run entirely offline. The launchers suppress all nonessential outbound traffic from the Claude Code binary.
Q: Can it fully replace a cloud Claude Code subscription?
A: For many coding tasks, yes—especially when privacy or offline access matters. Cloud Claude still provides stronger reasoning on complex multi-step problems and larger codebases. Use the local stack for sensitive work and fall back to cloud for the hardest prompts when connectivity and data policies allow it.
Q: How does tool-call reliability compare to cloud Claude?
A: The local server includes format repair and automatic retries that recovered a previously failing multi-step calendar task in automated tests. The test suite passed 98/98 runs. Real-world reliability depends on the model, prompt complexity, and session length.










