Run Claude Code Locally or in the Cloud in 5 Minutes with Ollama, LM Studio, llama.cpp, and OpenRouter

This guide shows how to configure Claude Code to run on local or cloud models within five minutes, covering hardware requirements, recommended models, step‑by‑step installation for Ollama, llama.cpp, LM Studio, and cloud‑based options, plus performance and cost comparisons.

AI Algorithm Path
AI Algorithm Path
AI Algorithm Path
Run Claude Code Locally or in the Cloud in 5 Minutes with Ollama, LM Studio, llama.cpp, and OpenRouter

Hardware requirements

For programming use, 32 GB RAM is recommended. Models with 24 B+ parameters run comfortably on a MacBook Pro M1 (32 GB) and on an Nvidia DGX Spark (120 GB, GB10 GPU). 16 GB can run tiny models but results in slower, error‑prone experience.

Recommended starter models

devstral-small-2 – 24 B – good starting point for coding quality

qwen3-coder:30b – 30 B – better coding ability, still usable on 32 GB

GLM4.7-flash:q8_0 – ~30 B (quantized) – excellent price‑performance trade‑off

Why use alternative models?

Claude Code on the official Anthropic API incurs rapidly increasing costs. Third‑party alternatives can reduce expenses by up to 98 % (e.g., DeepSeek V3.2 is the cheapest; Ollama’s local option is free).

Solution 1 – Ollama local model

Time: 5 minutes | Cost: Free | Scenario: privacy, no internet required

Install Ollama: curl -fsSL https://ollama.com/install.sh | sh Download a model (e.g., devstral-small-2): ollama pull devstral-small-2 Launch Claude Code with the model: ollama launch claude --model devstral-small-2 Add environment variables (e.g., to ~/.zshrc or ~/.bashrc):

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY="your api key"
export ANTHROPIC_BASE_URL="http://localhost:11434"

Apply the configuration and run Claude Code:

source ~/.zshrc
claude --model devstral-small-2

Performance on the Mac M1 was acceptable with devstral-small-2. qwen3-coder (32 B) was too slow, while glm-4.7-flash (30 B, F16) matched the speed of Claude Opus 4.5.

Solution 2 – llama.cpp

Time: 15‑20 minutes | Cost: Free | Scenario: any HuggingFace model

Compile llama.cpp.

macOS (Apple Silicon):

brew install cmake
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build -DGGML_METAL=ON
cmake --build llama.cpp/build --config Release -j
cp llama.cpp/build/bin/llama-* llama.cpp/

Linux (NVIDIA GPU):

sudo apt-get update && sudo apt-get install build-essential cmake git -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j
cp llama.cpp/build/bin/llama-* llama.cpp/

Start a server with a HuggingFace model (example: bartowski/cerebras_Qwen3-Coder-REAP-25B-A3B-GGUF:Q4_K_M):

llama-server -hf bartowski/cerebras_Qwen3-Coder-REAP-25B-A3B-GGUF:Q4_K_M \
    --alias "Qwen3-Coder-REAP-25B-A3B-GGUF" \
    --port 8000 \
    --jinja \
    --kv-unified \
    --cache-type-k q8_0 --cache-type-v q8_0 \
    --flash-attn on \
    --batch-size 4096 --ubatch-size 1024 \
    --ctx-size 64000

The --jinja flag is essential for tool calls.

Connect Claude Code to the server:

export ANTHROPIC_BASE_URL="http://localhost:8000"
claude --model Qwen3-Coder-REAP-25B-A3B-GGUF

Solution 3 – LM Studio

Time: 5 minutes | Cost: Free | Scenario: privacy, no internet, graphical UI

Install LM Studio:

curl -fsSL https://lmstudio.ai/install.sh | bash

Download a model via the LM Studio UI (e.g., qwen3-coder).

Start the LM Studio server on a chosen port (e.g., 1234): lms server start -port 1234 Set environment variables:

export ANTHROPIC_BASE_URL="http://localhost:1234"
export ANTHROPIC_AUTH_TOKEN="lmstudio"

Run Claude Code with the selected model:

claude --model qwen/qwen3-coder-30b

Solution 4 – Ollama cloud model

Time: 2 minutes | Cost: pay‑as‑you‑go | Scenario: local workflow with cloud compute

Pull a cloud‑enabled model (e.g., kimi-k2.5:cloud or minimax-m2.1:cloud):

ollama pull kimi-k2.5:cloud
ollama pull minimax-m2.1:cloud

Launch Claude Code with the cloud model: ollama launch claude --model minimax-m2.1:cloud The :cloud variant runs on Ollama’s infrastructure with the same CLI, eliminating API‑key management.

Solution 5 – Direct cloud‑provider API

Time: 2 minutes | Cost: pay‑as‑you‑go | Scenario: direct API access, more control

Configure Claude Code to use OpenRouter (or another provider) by setting environment variables:

export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_API_KEY=""   # leave empty for OpenRouter
export ANTHROPIC_MODEL="openai/gpt-oss-120b:free"

Run Claude Code with the chosen provider model: claude --model openai/gpt-oss-120b:free Similar configurations work for Minimax, DeepSeek, Kimi, or GLM by adjusting ANTHROPIC_BASE_URL, ANTHROPIC_MODEL, and ANTHROPIC_API_KEY.

Conclusion

Claude Code is now highly flexible: it can remain on Anthropic’s API, run locally on a Mac M1 with devstral-small-2 (24 B) for privacy, leverage powerful GPUs on a Nvidia DGX Spark for larger models, or use inexpensive cloud providers such as Kimi, Minimax, DeepSeek, or GLM, which can be up to 98 % cheaper than Opus 4.5. The accompanying GitHub repository ( https://github.com/luongnv89/claude-howto) contains the full scripts and configuration details.

AI Model DeploymentOllamallama.cppOpenRouterClaude CodeLM Studio
AI Algorithm Path
Written by

AI Algorithm Path

A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.