Local AI Coding Agent for Small LLMs

SmallCode is an open-source, terminal-native AI coding agent built for local language models in the 8B–35B parameter range. It runs on consumer hardware via LM Studio, Ollama, or any OpenAI-compatible endpoint.

This coding agent is ideal for developers who want a private coding agent on consumer hardware. It helps smaller LLMs handle coding tasks through budget-managed context, forgiving tool parsing, TODO-based planning, patch-first editing, persistent shell sessions, working memory, and optional cloud escalation for hard failures.

Visit SmallCode

Features

Optimized for 8B to 35B local models on consumer hardware.
Manages context budgets actively: tool results cap at 4,000 characters, mid-turn eviction drops old results when the window fills, and semantic compression summarizes history before dropping it.
Parses tool calls from JSON, YAML, XML, Hermes format, or plain text, and auto-repairs common parameter errors.
Edits files through search-and-replace patches rather than full rewrites, which reduces truncation and hallucination errors common in small models.
Decomposes complex tasks into a TODO file and validates each step through lint or compile before advancing.
Detects repetition loops, patch spirals, and greeting regression (when the model loses task context) and intervenes before wasted tokens accumulate.
Maintains a persistent shell session so cd, environment variables and shell variables survive across tool calls.
Injects a compact project summary on startup covering runtime, package manager, framework, entry point, and build and test commands, saving the 3–5 tool calls small models typically spend on discovery.
Stores a per-session evidence log of what was tried, what worked, and what failed, searchable across future sessions.
Caps thinking budgets for reasoning models (Qwen3, DeepSeek R1, GPT-5 reasoning variants) to prevent token waste on trivial tasks.
Supports optional cloud escalation to Claude, OpenAI, or DeepSeek when a local model hard-fails after retry and decomposition.
Includes a programmatic API so you can embed SmallCode in CI pipelines or custom tooling.
Ships a benchmark harness with three suites (smoke, polyglot-mini, and tool-use) that you can run against any local model.

Use Cases

Run an AI coding agent against a local model from LM Studio or Ollama.
Edit files in a project through patch-based changes.
Create small scripts, backend files, configuration files, and tests.
Let a smaller model follow multi-step refactors through a persistent task plan.
Use local coding workflows where privacy and offline execution matter.
Benchmark local models across coding, tool-use, and polyglot task suites.
Build custom coding tools around the SmallCode JavaScript API.

How to Get Started

You can install SmallCode via npm or prebuilt binaries for Windows, macOS, and Linux. The prebuilt option bundles Node.js and all native dependencies, so you skip node-gyp and C++ build tools entirely.

Install globally via npm:

npm install -g smallcode

Or run without installing:

npx smallcode

Linux and macOS one-line install (prebuilt binary):

bash <(curl -fsSL https://raw.githubusercontent.com/Doorman11991/smallcode/main/install.sh)

Windows one-line install (prebuilt binary):

iwr -Uri https://raw.githubusercontent.com/Doorman11991/smallcode/main/install.ps1 -UseBasicParsing | iex

The install script downloads the correct binary for your platform, extracts it to ~/.smallcode, and adds it to your PATH.

Requirements:

Node.js 18 or later (LTS versions 20.x and 22.x have prebuilt SQLite binaries)
A running local LLM server: LM Studio, Ollama, or any OpenAI-compatible endpoint

On non-LTS Node versions (23+, 25+), better-sqlite3 requires native compilation. Linux needs python3, make, and gcc. macOS needs Xcode Command Line Tools. Windows needs Visual Studio Build Tools with the “Desktop development with C++” workload.

If the build fails, SmallCode falls back to JSON-based memory automatically.

Configuration:

Create a .env file in your project root:

# Required
SMALLCODE_MODEL=your-model-name
SMALLCODE_BASE_URL=http://localhost:1234/v1
# Optional: cloud fallback on hard fail
# ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# DEEPSEEK_API_KEY=sk-...

Start SmallCode from your project directory:

cd my-project
smallcode

Environment Variables

Variable	Default	Description
`SMALLCODE_MODEL`	—	Required. Local model name.
`SMALLCODE_BASE_URL`	—	Required. OpenAI-compatible endpoint URL.
`SMALLCODE_THINKING_BUDGET`	`2000`	Token cap for reasoning model thinking blocks.
`SMALLCODE_THINKING_DISABLE`	`false`	Set `true` to disable thinking entirely.
`SMALLCODE_KNOWLEDGE_MAX_TOKENS`	`1500`	Token budget for `knowledge/` directory injection.
`SMALLCODE_WEB_BROWSE`	`false`	Set `true` to enable web search and fetch tools.
`SMALLCODE_WRITE_GUARD`	`true`	Blocks first write to an unread existing file.
`SMALLCODE_DEDUP`	`true`	Short-circuits identical read-only tool calls.
`SMALLCODE_EVIDENCE_DISABLE`	`false`	Set `true` to disable the evidence store.
`SMALLCODE_PLAN`	—	Set `true` or `false` to force plan-then-execute mode.
`SMALLCODE_SNAPSHOT_AUTO_ROLLBACK`	`false`	Set `true` for automatic rollback on hard validation fail.
`SMALLCODE_SNAPSHOT`	`true`	Set `false` to disable snapshots entirely.
`SMALLCODE_TEST_RUNNER`	—	Override auto-detected test command.
`SMALLCODE_TEST_DISABLE`	`false`	Set `true` to disable test runner detection.
`SMALLCODE_BOOTSTRAP`	`true`	Set `false` to disable project summary injection.
`SMALLCODE_TEMP_ADAPT`	`true`	Set `false` to disable adaptive retry temperature.
`SMALLCODE_TRUST_DECAY`	`true`	Set `false` to disable per-tool failure tracking.
`SMALLCODE_SHELL_PERSIST`	`true`	Set `false` to use a fresh shell per bash call.

TUI Commands

Command	Description
`/quit`, `/q`	Exit SmallCode.
`/clear`	Reset the conversation.
`/stats`	Show session statistics.
`/tokens`	Detailed token usage report.
`/budget`	Context window usage with a visual bar.
`/trace`	List, view, or export execution traces.
`/eval`	Run prompt evaluation suites.
`/memory`	Show working memory.
`/plan`	Show the current task plan.
`/model`	Show or switch the active model.
`/profile`	Show detected model profile and routing mode.
`/cognition`	Show MarrowScript cognition layer status.
`/mcp`	Show connected external MCP servers.
`/skill`	Manage reusable skills.
`/plugin`	Install or manage plugins.
`/sessions`	List and resume saved sessions.
`/help`	Show all commands.

Available Tools

Tool	Description
`read_file`	Read file contents.
`write_file`	Create or overwrite files.
`patch`	Search-and-replace edit.
`bash`	Run shell commands.
`search`	Regex search via ripgrep.
`find_files`	Glob file search.
`graph_search`	Code graph symbol search.
`explain_symbol`	Full symbol explanation with callers and callees.
`memory_load`	Load relevant project memory.
`memory_remember`	Save knowledge to memory.
`bone_compile`	Compile a `.bone` file to a full backend project.
`bone_check`	Validate a `.bone` file for type errors and constraints.
`list_projects`	List all indexed projects with stats.
`web_search`	Search via DuckDuckGo (requires `SMALLCODE_WEB_BROWSE=true`).
`web_fetch`	Fetch and extract text from a URL (requires `SMALLCODE_WEB_BROWSE=true`).

Programmatic API

The RunResult object contains: response text, tool call records, files created and edited, token usage, duration, and success status.

const { SmallCode } = require('smallcode');
const agent = new SmallCode({
  model: 'gemma-4-e4b',
  baseUrl: 'http://localhost:1234/v1',
});
const result = await agent.run("create hello.py that prints hello world");
console.log(result.filesCreated);   // ['hello.py']
console.log(result.success);        // true
agent.on('tool_start', ({ name, args }) => console.log(`Using: ${name}`));
agent.on('tool_end', ({ name, ms }) => console.log(`Done: ${name} (${ms}ms)`));

SmallCode vs OpenCode

	SmallCode	OpenCode
Best for	Local small-model coding.	Frontier-model coding.
Model type	8B to 35B local models.	Claude, GPT, and stronger models.
Setup	Local endpoint required.	Cloud model access required.
Privacy	Local-first.	Cloud-first.
Context	Budget-managed.	Large-context oriented.
Tool calls	Forgiving parser.	Cleaner model output expected.
Editing	Patch-first.	Full-file edits.
Planning	TODO-based steps.	Model-driven flow.
Choose it when	You need local control.	You need stronger models.

SmallCode vs Claude Code

	SmallCode	Claude Code
Best for	Local small-model coding.	Claude-powered coding.
Model type	8B to 35B local models.	Claude models.
Setup	Local endpoint required.	Claude access required.
Privacy	Local-first.	Cloud-first.
Context	Budget-managed.	Claude-managed.
Tool calls	Forgiving parser.	Native agent tools.
Editing	Patch-first.	Broader file edits.
Planning	TODO-based steps.	Agentic task flow.
Choose it when	You need local control.	You need stronger output.

Alternatives and Related Resources

Best CLI AI Coding Agents: Compare CLI AI coding agents before choosing a local workflow.
Free AI Tools for Developers: Find developer-focused AI tools beyond terminal coding agents.
Pi Coding Agent: Compare SmallCode with another AI coding agent.
DeepSeek TUI: Explore a DeepSeek-focused open-source coding agent.
Claude Code Resource List: Find Claude Code agents, skills, plugins, and related developer resources.
Claude Code Commands Cheat Sheet: Review command-focused workflows for Claude Code users.

Pros

Fully local, no data leaves your machine by default.
Free and open-source under MIT license.
Prebuilt binaries need no Node.js or build tools.
Works with any OpenAI-compatible endpoint.
Context budgeting prevents window overflow on small models.
Patch-first editing reduces hallucination risk.
Evidence store learns from past session failures.
Snapshot and rollback protect against failed edits.

Cons

Models under 4B parameters are not supported.
Web browsing tools recommend 20B+ models for reliable synthesis.
Cloud escalation requires a paid API key from Anthropic, OpenAI, or DeepSeek.
No GUI. Terminal only.

FAQs

Q: Is SmallCode free?
A: SmallCode is free and open-source under the MIT license. You need your own local model server or optional API keys for cloud escalation.

Q: Can SmallCode run fully locally?
A: SmallCode can run locally when the model endpoint runs on the same machine or local network. Cloud escalation and web browsing are optional features that change the local-only workflow.

Q: What makes SmallCode different from other AI coding agents?
A: SmallCode focuses on smaller local models through budget-managed context, forgiving tool parsing, TODO-driven planning, search-and-replace patches, working memory, and loop detection.

Q: Can SmallCode browse the web during coding tasks?
A: SmallCode can use web search and web fetch tools after SMALLCODE_WEB_BROWSE=true is enabled. The feature stays disabled by default.

Q: How does SmallCode handle a model that keeps failing on the same task?
A: SmallCode tracks consecutive failures per tool in a session. A tool that fails three times in a row gets demoted in the schema list. A tool that fails five times gets removed from the schema for the session. The early-stop detector also catches repetition loops and patch spirals and intervenes before the token cost grows further.