anymodel

1.17.0 • Public • Published

AnyModel

Universal AI coding tool — use GPT-5.4, Gemini 3.1, DeepSeek R1, Codex, Llama, and 300+ models through one interface.

npm version license node

AnyModel is an AI coding assistant that works with any model. It includes a proxy that routes requests to OpenRouter (300+ cloud models), local backends (Ollama, LMStudio, llama.cpp), or any OpenAI-compatible API — with smart retries, format translation, and zero dependencies.

anymodel.dev — full docs, presets, and FAQ.

Watch the Demo

Watch the demo


Quick Start

# Terminal 1 — start AnyModel proxy with a model:
OPENROUTER_API_KEY=sk-or-v1-your-key npx anymodel proxy deepseek

# Terminal 2 — launch AnyModel:
npx anymodel

The model is set on the proxy via preset or --model. Connecting is always just npx anymodel.

Get your free OpenRouter key at openrouter.ai/keys — no credit card for free models.


Presets

# Paid models:
npx anymodel proxy gpt        # → openai/gpt-5.4                       (paid)
npx anymodel proxy codex      # → openai/gpt-5.3-codex                 (paid, coding)
npx anymodel proxy gemini     # → google/gemini-3.1-flash-lite-preview  (paid)
npx anymodel proxy deepseek   # → deepseek/deepseek-r1-0528            (paid)
npx anymodel proxy mistral    # → mistralai/devstral-2512               (paid, coding)
npx anymodel proxy gemma      # → google/gemma-4-31b-it                (paid, coding)

# Free models:
npx anymodel proxy qwen       # → qwen/qwen3-coder:free                (free)
npx anymodel proxy nemotron   # → nvidia/nemotron-3-super-120b-a12b:free (free)
npx anymodel proxy llama      # → meta-llama/llama-3.3-70b-instruct:free (free)

Or any of 300+ models: npx anymodel proxy --model mistralai/codestral-2508

How It Works

AnyModel client → anymodel proxy (:9090) → OpenRouter / Ollama / LMStudio / llama.cpp

The proxy intercepts requests, strips provider-specific fields, handles retries with exponential backoff, and streams responses back.

Multiple Models at Once

Run separate instances on different ports:

npx anymodel proxy --port 9090 --model openai/gpt-5.4
npx anymodel proxy --port 9091 --model deepseek/deepseek-r1-0528
npx anymodel proxy --port 9092 --model google/gemini-3.1-flash-lite-preview

Local Backends

No internet, no API key — run everything on your machine. AnyModel treats Ollama, LMStudio, and llama.cpp as first-class backends, each with its own preset:

npx anymodel proxy ollama --model gemma3n            # Ollama    (:11434)
npx anymodel proxy lmstudio --model qwen3-coder      # LMStudio  (:1234/v1)
npx anymodel proxy llamacpp --model my-model         # llama.cpp (:8080/v1)
Backend Port API Best for
Ollama 11434 Native (think:false suppresses reasoning-token waste on qwen3/deepseek) One-line model pulls, managed model library
LMStudio 1234/v1 OpenAI-compatible GUI model browser, easy swapping between loaded models
llama.cpp 8080/v1 OpenAI-compatible Rawest/smallest footprint, max control (context, GPU layers, batch, quantization)

GGUF portability: The same GGUF model file runs across all three — only the wrapper UX differs. Download once, use anywhere. llama.cpp is the inference engine under Ollama and LMStudio.

Override endpoints via env:

LMSTUDIO_BASE_URL=http://192.168.1.50:1234/v1 npx anymodel proxy lmstudio
LLAMACPP_BASE_URL=http://localhost:9000/v1    npx anymodel proxy llamacpp

Auto-detection priority when no preset is given: OpenRouter key → OpenAI key → Ollama → LMStudio → llama.cpp.

Local-provider smart defaults (1.11.0+)

When you connect to a local provider, AnyModel automatically suppresses your globally-configured MCP servers — which are usually the single biggest cause of slow first-response times (50–60 K tokens of tool schemas that local models can't handle).

  • npx anymodel on a local provider → loads project ./.claude/.mcp.json if present, else no MCP
  • Keeps project skills, agents, CLAUDE.md
  • Remote providers (openrouter, openai) unchanged
  • Opt out: --full-mcp flag or ANYMODEL_FULL_MCP=1

See LOCAL_SETUP.md for the full guide, including 32 K context setup and full isolation.

Universal Skills (1.16.0+)

SKILL.md is one shared open standard — Claude Code, OpenAI/Codex, Gemini/Antigravity, Cursor, and Copilot all read the same format (a <name>/SKILL.md directory with YAML frontmatter + Markdown body). AnyModel auto-discovers your skills no matter which tool's convention you used, with zero format translation.

At launch, AnyModel scans these roots in both the project working directory and $HOME:

.claude/skills/    .agents/skills/    .codex/skills/    .gemini/skills/    .agent/skills/

Each discovered skill is symlinked into a per-session temp .claude/skills shadow that is passed to the client via --add-dir, so the client's native SKILL.md reader and progressive disclosure handle everything.

  • Project wins on collision — a project .claude/skills/<name> shadows a foreign-root skill of the same name.
  • Duplicates and unlinkable skills are logged — foreign-root name collisions and any skills that can't be symlinked are surfaced, not silently dropped.
  • Add or override roots with ANYMODEL_SKILL_ROOTS — a colon-separated list of absolute paths merged into discovery.
ANYMODEL_SKILL_ROOTS=/opt/shared/skills:/Users/me/extra/skills npx anymodel

OpenAI-Compatible APIs

Works with OpenAI, Azure, Together, Groq, vLLM, and any OpenAI-compatible endpoint:

OPENAI_API_KEY=sk-your-key npx anymodel proxy openai --model gpt-4o

# Terminal 2:
npx anymodel

Bidirectional translation: Anthropic Messages API ↔ OpenAI Chat Completions.

Claude Code --effort / /effort is forwarded as OpenAI reasoning_effort for compatible OpenAI reasoning/codex models on the official OpenAI API. Local OpenAI-compatible servers do not receive it by default; set ANYMODEL_FORWARD_EFFORT=1 only if your endpoint accepts that field.

CLI Reference

anymodel                              # launch AnyModel (connect to proxy)
anymodel proxy <preset>               # start proxy with preset
anymodel proxy --model <id>           # start proxy with any model
anymodel proxy ollama --model <name>  # proxy with local Ollama    (:11434)
anymodel proxy lmstudio --model <id>  # proxy with LMStudio        (:1234/v1)
anymodel proxy llamacpp --model <id>  # proxy with llama.cpp       (:8080/v1)
anymodel claude                       # run with native Claude (no proxy)

Options:
  --model, -m     Model ID
  --port, -p      Port (default: 9090)
  --free-only     Block paid models
  --token, -t     Require auth token for requests
  --rpm           Rate limit requests/min (default: 60)
  --help, -h      Help

Ollama Performance Optimizations

When proxying to Ollama, AnyModel automatically applies several optimizations to make local models work well with coding tools:

  • System prompt condensing — AI tool prompts are 50-100KB; AnyModel condenses them to fit Ollama's context window (OLLAMA_MAX_SYSTEM_CHARS)
  • Tool description trimming — truncates verbose tool descriptions to save context (OLLAMA_MAX_TOOL_DESC, default 100 chars)
  • Tool count limiting — limits tools sent to the model, always keeping core tools (Bash/Read/Write/Edit/Grep/Glob) (OLLAMA_MAX_TOOLS)
  • Prefix-aware caching — stabilizes system prompt + tool ordering for Ollama KV cache reuse across requests, with date normalization and description-independent hashing
  • HTTP keep-alive — reuses TCP connections to Ollama
  • count_tokens mock — responds to /v1/messages/count_tokens locally, preventing cascading 500 errors

Environment Variables

Variable Default Description
OPENROUTER_API_KEY Your OpenRouter key (get one free)
OPENROUTER_MODEL Default model override
OPENAI_API_KEY Key for OpenAI-compatible APIs
OPENAI_BASE_URL https://api.openai.com/v1 Custom endpoint for the openai provider
LMSTUDIO_BASE_URL http://localhost:1234/v1 LMStudio endpoint override
LLAMACPP_BASE_URL http://localhost:8080/v1 llama.cpp (llama-server) endpoint override
PROXY_PORT 9090 Proxy port
ANYMODEL_CLIENT Path to custom Claude-compatible client; otherwise AnyModel uses bundled cli.js, cwd cli.js, then global claude
ANYMODEL_TOKEN Auth token for remote mode
ANYMODEL_SKILL_ROOTS Colon-separated absolute paths added to skill discovery roots
ANYMODEL_FORWARD_EFFORT auto 1/0 override for forwarding Claude effort as OpenAI reasoning_effort
OLLAMA_NUM_CTX 8192 Ollama context window size
OLLAMA_KEEP_ALIVE 30m How long Ollama keeps model in GPU memory
OLLAMA_MAX_SYSTEM_CHARS 4000 System prompt condensing threshold
OLLAMA_MAX_MSG_CHARS max(4000, num_ctx*3) Message history threshold
OLLAMA_TOOLS auto Tool capability: auto/on/off
OLLAMA_MAX_TOOLS 0 (unlimited) Max tools to send (core tools always kept)
OLLAMA_MAX_TOOL_DESC 100 Max tool description length in chars

OPENROUTER_API_KEY is only needed when starting the proxy. OLLAMA_* variables only apply to the Ollama provider.

Links

License

MIT — Anton Abyzov