Artificial Intelligence 12 min read

Run Claude Code Locally or in the Cloud in 5 Minutes with Ollama, LM Studio, llama.cpp, and OpenRouter

This guide shows how to configure Claude Code to run on local or cloud models within five minutes, covering hardware requirements, recommended models, step‑by‑step installation for Ollama, llama.cpp, LM Studio, and cloud‑based options, plus performance and cost comparisons.

AI Algorithm Path

Apr 21, 2026

Run Claude Code Locally or in the Cloud in 5 Minutes with Ollama, LM Studio, llama.cpp, and OpenRouter

Hardware requirements

For programming use, 32 GB RAM is recommended. Models with 24 B+ parameters run comfortably on a MacBook Pro M1 (32 GB) and on an Nvidia DGX Spark (120 GB, GB10 GPU). 16 GB can run tiny models but results in slower, error‑prone experience.

Recommended starter models

devstral-small-2 – 24 B – good starting point for coding quality

qwen3-coder:30b – 30 B – better coding ability, still usable on 32 GB

GLM4.7-flash:q8_0 – ~30 B (quantized) – excellent price‑performance trade‑off

Why use alternative models?

Claude Code on the official Anthropic API incurs rapidly increasing costs. Third‑party alternatives can reduce expenses by up to 98 % (e.g., DeepSeek V3.2 is the cheapest; Ollama’s local option is free).

Solution 1 – Ollama local model

Time: 5 minutes | Cost: Free | Scenario: privacy, no internet required

Install Ollama: curl -fsSL https://ollama.com/install.sh | sh Download a model (e.g., devstral-small-2): ollama pull devstral-small-2 Launch Claude Code with the model: ollama launch claude --model devstral-small-2 Add environment variables (e.g., to ~/.zshrc or ~/.bashrc):

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY="your api key"
export ANTHROPIC_BASE_URL="http://localhost:11434"

Apply the configuration and run Claude Code:

source ~/.zshrc
claude --model devstral-small-2

Performance on the Mac M1 was acceptable with devstral-small-2. qwen3-coder (32 B) was too slow, while glm-4.7-flash (30 B, F16) matched the speed of Claude Opus 4.5.

Solution 2 – llama.cpp

Time: 15‑20 minutes | Cost: Free | Scenario: any HuggingFace model

Compile llama.cpp.

macOS (Apple Silicon):

brew install cmake
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build -DGGML_METAL=ON
cmake --build llama.cpp/build --config Release -j
cp llama.cpp/build/bin/llama-* llama.cpp/

Linux (NVIDIA GPU):

sudo apt-get update && sudo apt-get install build-essential cmake git -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j
cp llama.cpp/build/bin/llama-* llama.cpp/

Start a server with a HuggingFace model (example: bartowski/cerebras_Qwen3-Coder-REAP-25B-A3B-GGUF:Q4_K_M):

llama-server -hf bartowski/cerebras_Qwen3-Coder-REAP-25B-A3B-GGUF:Q4_K_M \
    --alias "Qwen3-Coder-REAP-25B-A3B-GGUF" \
    --port 8000 \
    --jinja \
    --kv-unified \
    --cache-type-k q8_0 --cache-type-v q8_0 \
    --flash-attn on \
    --batch-size 4096 --ubatch-size 1024 \
    --ctx-size 64000

The --jinja flag is essential for tool calls.

Connect Claude Code to the server:

export ANTHROPIC_BASE_URL="http://localhost:8000"
claude --model Qwen3-Coder-REAP-25B-A3B-GGUF

Solution 3 – LM Studio

Time: 5 minutes | Cost: Free | Scenario: privacy, no internet, graphical UI

Install LM Studio:

curl -fsSL https://lmstudio.ai/install.sh | bash

Download a model via the LM Studio UI (e.g., qwen3-coder).

Start the LM Studio server on a chosen port (e.g., 1234): lms server start -port 1234 Set environment variables:

export ANTHROPIC_BASE_URL="http://localhost:1234"
export ANTHROPIC_AUTH_TOKEN="lmstudio"

Run Claude Code with the selected model:

claude --model qwen/qwen3-coder-30b

Solution 4 – Ollama cloud model

Time: 2 minutes | Cost: pay‑as‑you‑go | Scenario: local workflow with cloud compute

Pull a cloud‑enabled model (e.g., kimi-k2.5:cloud or minimax-m2.1:cloud):

ollama pull kimi-k2.5:cloud
ollama pull minimax-m2.1:cloud

Launch Claude Code with the cloud model: ollama launch claude --model minimax-m2.1:cloud The :cloud variant runs on Ollama’s infrastructure with the same CLI, eliminating API‑key management.

Solution 5 – Direct cloud‑provider API

Time: 2 minutes | Cost: pay‑as‑you‑go | Scenario: direct API access, more control

Configure Claude Code to use OpenRouter (or another provider) by setting environment variables:

export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_API_KEY=""   # leave empty for OpenRouter
export ANTHROPIC_MODEL="openai/gpt-oss-120b:free"

Run Claude Code with the chosen provider model: claude --model openai/gpt-oss-120b:free Similar configurations work for Minimax, DeepSeek, Kimi, or GLM by adjusting ANTHROPIC_BASE_URL, ANTHROPIC_MODEL, and ANTHROPIC_API_KEY.

Conclusion

Claude Code is now highly flexible: it can remain on Anthropic’s API, run locally on a Mac M1 with devstral-small-2 (24 B) for privacy, leverage powerful GPUs on a Nvidia DGX Spark for larger models, or use inexpensive cloud providers such as Kimi, Minimax, DeepSeek, or GLM, which can be up to 98 % cheaper than Opus 4.5. The accompanying GitHub repository ( https://github.com/luongnv89/claude-howto) contains the full scripts and configuration details.

AI Model Deployment Ollama llama.cpp OpenRouter Claude Code LM Studio

Written by

AI Algorithm Path

A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Hardware requirements

Recommended starter models

Why use alternative models?

Solution 1 – Ollama local model

Solution 2 – llama.cpp

Solution 3 – LM Studio

Solution 4 – Ollama cloud model

Solution 5 – Direct cloud‑provider API

Conclusion

AI Algorithm Path

How this landed with the community

Was this worth your time?

0 Comments

Solution 1 – Ollama local model

Solution 2 – llama.cpp

Solution 3 – LM Studio

Solution 4 – Ollama cloud model

Solution 5 – Direct cloud‑provider API