Claude Code Local: Run Claude Code With Local AI on Apple Silicon

Claude Code Local is a free open-source project that allows you to use Claude Code with local AI models on an Apple Silicon Mac.

You can enter prompts, approve file changes, run shell commands, and work through Claude Code as usual, but a local MLX server sends requests to models such as Gemma, Qwen, or Llama instead of a hosted Claude model.

Visit Claude Code Local

The stack replaces Anthropic’s cloud endpoint with a lightweight MLX-native server that speaks directly to the Anthropic API.

Claude Code thinks it is talking to the cloud, but every prompt, every file, every tool call stays on localhost.

You can pick from a lineup of local models, such as Gemma 4 31B, Qwen 3.5 122B, DeepSeek V4 Flash, or Llama 3.3 70B, and switch between them using an environment variable.

The same server handles all three model families, translates tool-use formats, and reuses prompt caches so coding sessions feel responsive.

Features

Runs local models through a single MLX-native Anthropic API server.
Supports Claude Code’s full tool-use interface across Bash, Read, Edit, Write, Grep, and Glob tools.
4 one-click launchers for coding, browser control, hands-free voice, and phone-based control.
Recovers garbled tool calls from small models with automatic retry and format repair.
Strips Claude Code’s 10K-token harness prompt in code mode for a 28× token reduction.
Reuses prompt caches across requests so the system prompt prefills only once per session.
Passes MCP tool definitions through to the local model and translates responses back into Anthropic format.
Includes a browser agent that controls a real Brave browser via Chrome DevTools Protocol.
Provides a full voice pipeline that uses on-device speech recognition and a cloned-voice TTS.

Use Cases

Repair a failing unit test by tracing the error, editing one function, and running the test command from Claude Code.
Explain a stack trace or a legacy configuration file without uploading the code to a hosted model endpoint.
Draft a utility function, shell command, regular expression, or migration script on a 16 GB MacBook.
Search a local repository for related files before preparing a contained patch.
Run larger local models on a 64 GB or 96 GB Mac for repeated maintenance tasks, repository searches, and tool-assisted edits.

How to Use Claude Code Local

Standard Apple Silicon Setup

Install the Claude Code CLI, clone the repository, and run the setup script:

npm install -g @anthropic-ai/claude-code
git clone https://github.com/nicedreamzapp/claude-code-local
cd claude-code-local
bash setup.sh

The installer checks your unified memory, installs Python 3.12 and MLX dependencies, downloads a starter model, and creates Claude Local.command on your Desktop.

Open Claude Local.command to start the local server and launch Claude Code. The session routes requests through localhost:4000 to the selected model on your Apple Silicon GPU.

Model Paths by Mac Memory

Unified Memory	Default Model Route	Suitable Work
Under 32 GB	Qwen 3.5 4B through `setup.sh`	Short prompts, code explanations, small snippets
16 GB	Qwen 2.5 Coder 14B through Chat or Agentico launchers	Debugging, contained fixes, limited local tool use
32 GB	Gemma 4 12B	Small-to-medium coding tasks
64 GB	Gemma 4 31B	Regular local coding sessions
96 GB or more	Qwen 3.5 122B	Larger repositories and heavier local agent work
128 GB or more	DeepSeek V4 Flash route	Long-context local workflows

The 32 GB and 64 GB defaults use abliterated Gemma builds. Abliteration changes a model’s refusal behavior; it does not turn the model into a Claude-equivalent coding system.

Mac Base and Pro 16 GB Setup

The 16 GB route uses the included Claude Chat.command and Claude Agentico.command launchers. Both target Qwen 2.5 Coder 14B in 4-bit MLX format, which uses about 7.8 GB of model weights.

Run the standard setup once, then open the launcher that matches the task:

cd claude-code-local
open "launchers/Claude Chat.command"

Use Chat mode for code questions, pasted stack traces, snippets, configuration explanations, and planning. It starts Claude Code without tools.

open "launchers/Claude Agentico.command"

Use Agentico mode when Claude Code needs to read files, write a focused patch, search a project, or run shell commands. The launcher uses low-effort mode and a 4-bit KV cache to fit the Claude Code prompt and local model into 16 GB of unified memory.

Expect about 10 to 15 tokens per second with Qwen 2.5 Coder 14B on a 16 GB Mac. Keep tool-driven tasks narrow. A focused bug fix or test update fits this setup far better than a multi-file refactor with a long chain of edits and commands.

Local Model Sessions and Connected Tools

A local coding session sends the model request from Claude Code to the MLX server on your Mac. After the repository, dependencies, and model weights are downloaded, the model route works without an internet connection.

Connected tools use their own data paths:

Tool or Feature	Data Path
Local model and local filesystem tools	Mac only
GitHub MCP server	GitHub API
Web-search MCP server	Search provider API
Browser Agent	Local Brave browser and the pages it opens
Phone and iMessage workflows	Apple Messages and related companion tools

The 16 GB Chat and Agentico launchers disable non-essential Claude Code traffic, including telemetry, marketplace auto-install, background tasks, and automatic updates. GitHub, web-search, browser, and remote-access tools remain separate connections.

Alternatives

Pros

Zero cloud dependency.
No API key or subscription needed.
Verified offline with lsof.
Multiple model choices in one server.
Works with Claude Code’s MCP plugin system.
65 tok/s on a fast Mac.
Includes browser agent and voice mode.

Cons

Requires an Apple Silicon Mac with large RAM.
Local models produce weaker reasoning than cloud Claude.
Large model downloads (18–81 GB).

FAQs

Q: Is Claude Code Local free?
A: Yes. The entire stack is open-source under the MIT license. You pay nothing for model inference.

Q: Does it require an internet connection?
A: No. The server, models, and launchers run entirely offline. The launchers suppress all nonessential outbound traffic from the Claude Code binary.

Q: Can it fully replace a cloud Claude Code subscription?
A: For many coding tasks, yes—especially when privacy or offline access matters. Cloud Claude still provides stronger reasoning on complex multi-step problems and larger codebases. Use the local stack for sensitive work and fall back to cloud for the hardest prompts when connectivity and data policies allow it.

Q: How does tool-call reliability compare to cloud Claude?
A: The local server includes format repair and automatic retries that recovered a previously failing multi-step calendar task in automated tests. The test suite passed 98/98 runs. Real-world reliability depends on the model, prompt complexity, and session length.