Universal AI coding tool — use GPT-5.4, Gemini 3.1, DeepSeek R1, Codex, Llama, and 300+ models through one interface.
AnyModel is an AI coding assistant that works with any model. It includes a proxy that routes requests to OpenRouter (300+ cloud models), local backends (Ollama, LMStudio, llama.cpp), or any OpenAI-compatible API — with smart retries, format translation, and zero dependencies.
anymodel.dev — full docs, presets, and FAQ.
# Terminal 1 — start AnyModel proxy with a model:
OPENROUTER_API_KEY=sk-or-v1-your-key npx anymodel proxy deepseek
# Terminal 2 — launch AnyModel:
npx anymodelThe model is set on the proxy via preset or --model. Connecting is always just npx anymodel.
Get your free OpenRouter key at openrouter.ai/keys — no credit card for free models.
# Paid models:
npx anymodel proxy gpt # → openai/gpt-5.4 (paid)
npx anymodel proxy codex # → openai/gpt-5.3-codex (paid, coding)
npx anymodel proxy gemini # → google/gemini-3.1-flash-lite-preview (paid)
npx anymodel proxy deepseek # → deepseek/deepseek-r1-0528 (paid)
npx anymodel proxy mistral # → mistralai/devstral-2512 (paid, coding)
npx anymodel proxy gemma # → google/gemma-4-31b-it (paid, coding)
# Free models:
npx anymodel proxy qwen # → qwen/qwen3-coder:free (free)
npx anymodel proxy nemotron # → nvidia/nemotron-3-super-120b-a12b:free (free)
npx anymodel proxy llama # → meta-llama/llama-3.3-70b-instruct:free (free)Or any of 300+ models: npx anymodel proxy --model mistralai/codestral-2508
AnyModel client → anymodel proxy (:9090) → OpenRouter / Ollama / LMStudio / llama.cpp
The proxy intercepts requests, strips provider-specific fields, handles retries with exponential backoff, and streams responses back.
Run separate instances on different ports:
npx anymodel proxy --port 9090 --model openai/gpt-5.4
npx anymodel proxy --port 9091 --model deepseek/deepseek-r1-0528
npx anymodel proxy --port 9092 --model google/gemini-3.1-flash-lite-previewNo internet, no API key — run everything on your machine. AnyModel treats Ollama, LMStudio, and llama.cpp as first-class backends, each with its own preset:
npx anymodel proxy ollama --model gemma3n # Ollama (:11434)
npx anymodel proxy lmstudio --model qwen3-coder # LMStudio (:1234/v1)
npx anymodel proxy llamacpp --model my-model # llama.cpp (:8080/v1)| Backend | Port | API | Best for |
|---|---|---|---|
| Ollama | 11434 |
Native (think:false suppresses reasoning-token waste on qwen3/deepseek) |
One-line model pulls, managed model library |
| LMStudio | 1234/v1 |
OpenAI-compatible | GUI model browser, easy swapping between loaded models |
| llama.cpp | 8080/v1 |
OpenAI-compatible | Rawest/smallest footprint, max control (context, GPU layers, batch, quantization) |
GGUF portability: The same GGUF model file runs across all three — only the wrapper UX differs. Download once, use anywhere. llama.cpp is the inference engine under Ollama and LMStudio.
Override endpoints via env:
LMSTUDIO_BASE_URL=http://192.168.1.50:1234/v1 npx anymodel proxy lmstudio
LLAMACPP_BASE_URL=http://localhost:9000/v1 npx anymodel proxy llamacppAuto-detection priority when no preset is given: OpenRouter key → OpenAI key → Ollama → LMStudio → llama.cpp.
When you connect to a local provider, AnyModel automatically suppresses your globally-configured MCP servers — which are usually the single biggest cause of slow first-response times (50–60 K tokens of tool schemas that local models can't handle).
-
npx anymodelon a local provider → loads project./.claude/.mcp.jsonif present, else no MCP - Keeps project skills, agents, CLAUDE.md
- Remote providers (openrouter, openai) unchanged
- Opt out:
--full-mcpflag orANYMODEL_FULL_MCP=1
See LOCAL_SETUP.md for the full guide, including 32 K context setup and full isolation.
SKILL.md is one shared open standard — Claude Code, OpenAI/Codex, Gemini/Antigravity, Cursor, and Copilot all read the same format (a <name>/SKILL.md directory with YAML frontmatter + Markdown body). AnyModel auto-discovers your skills no matter which tool's convention you used, with zero format translation.
At launch, AnyModel scans these roots in both the project working directory and $HOME:
.claude/skills/ .agents/skills/ .codex/skills/ .gemini/skills/ .agent/skills/
Each discovered skill is symlinked into a per-session temp .claude/skills shadow that is passed to the client via --add-dir, so the client's native SKILL.md reader and progressive disclosure handle everything.
-
Project wins on collision — a project
.claude/skills/<name>shadows a foreign-root skill of the same name. - Duplicates and unlinkable skills are logged — foreign-root name collisions and any skills that can't be symlinked are surfaced, not silently dropped.
-
Add or override roots with
ANYMODEL_SKILL_ROOTS— a colon-separated list of absolute paths merged into discovery.
ANYMODEL_SKILL_ROOTS=/opt/shared/skills:/Users/me/extra/skills npx anymodelWorks with OpenAI, Azure, Together, Groq, vLLM, and any OpenAI-compatible endpoint:
OPENAI_API_KEY=sk-your-key npx anymodel proxy openai --model gpt-4o
# Terminal 2:
npx anymodelBidirectional translation: Anthropic Messages API ↔ OpenAI Chat Completions.
Claude Code --effort / /effort is forwarded as OpenAI reasoning_effort for compatible OpenAI reasoning/codex models on the official OpenAI API. Local OpenAI-compatible servers do not receive it by default; set ANYMODEL_FORWARD_EFFORT=1 only if your endpoint accepts that field.
anymodel # launch AnyModel (connect to proxy)
anymodel proxy <preset> # start proxy with preset
anymodel proxy --model <id> # start proxy with any model
anymodel proxy ollama --model <name> # proxy with local Ollama (:11434)
anymodel proxy lmstudio --model <id> # proxy with LMStudio (:1234/v1)
anymodel proxy llamacpp --model <id> # proxy with llama.cpp (:8080/v1)
anymodel claude # run with native Claude (no proxy)
Options:
--model, -m Model ID
--port, -p Port (default: 9090)
--free-only Block paid models
--token, -t Require auth token for requests
--rpm Rate limit requests/min (default: 60)
--help, -h Help
When proxying to Ollama, AnyModel automatically applies several optimizations to make local models work well with coding tools:
-
System prompt condensing — AI tool prompts are 50-100KB; AnyModel condenses them to fit Ollama's context window (
OLLAMA_MAX_SYSTEM_CHARS) -
Tool description trimming — truncates verbose tool descriptions to save context (
OLLAMA_MAX_TOOL_DESC, default 100 chars) -
Tool count limiting — limits tools sent to the model, always keeping core tools (Bash/Read/Write/Edit/Grep/Glob) (
OLLAMA_MAX_TOOLS) - Prefix-aware caching — stabilizes system prompt + tool ordering for Ollama KV cache reuse across requests, with date normalization and description-independent hashing
- HTTP keep-alive — reuses TCP connections to Ollama
-
count_tokens mock — responds to
/v1/messages/count_tokenslocally, preventing cascading 500 errors
| Variable | Default | Description |
|---|---|---|
OPENROUTER_API_KEY |
— | Your OpenRouter key (get one free) |
OPENROUTER_MODEL |
— | Default model override |
OPENAI_API_KEY |
— | Key for OpenAI-compatible APIs |
OPENAI_BASE_URL |
https://api.openai.com/v1 |
Custom endpoint for the openai provider |
LMSTUDIO_BASE_URL |
http://localhost:1234/v1 |
LMStudio endpoint override |
LLAMACPP_BASE_URL |
http://localhost:8080/v1 |
llama.cpp (llama-server) endpoint override |
PROXY_PORT |
9090 |
Proxy port |
ANYMODEL_CLIENT |
— | Path to custom Claude-compatible client; otherwise AnyModel uses bundled cli.js, cwd cli.js, then global claude
|
ANYMODEL_TOKEN |
— | Auth token for remote mode |
ANYMODEL_SKILL_ROOTS |
— | Colon-separated absolute paths added to skill discovery roots |
ANYMODEL_FORWARD_EFFORT |
auto |
1/0 override for forwarding Claude effort as OpenAI reasoning_effort
|
OLLAMA_NUM_CTX |
8192 |
Ollama context window size |
OLLAMA_KEEP_ALIVE |
30m |
How long Ollama keeps model in GPU memory |
OLLAMA_MAX_SYSTEM_CHARS |
4000 |
System prompt condensing threshold |
OLLAMA_MAX_MSG_CHARS |
max(4000, num_ctx*3) |
Message history threshold |
OLLAMA_TOOLS |
auto |
Tool capability: auto/on/off |
OLLAMA_MAX_TOOLS |
0 (unlimited) |
Max tools to send (core tools always kept) |
OLLAMA_MAX_TOOL_DESC |
100 |
Max tool description length in chars |
OPENROUTER_API_KEY is only needed when starting the proxy. OLLAMA_* variables only apply to the Ollama provider.
- anymodel.dev — Homepage, docs, FAQ
- OpenRouter — Get your API key
- npm — Package
- YouTube — Demos and tutorials
MIT — Anton Abyzov
