How to Integrate Claude Code with Ollama for Local and Cloud LLM Workflows
This guide walks you through installing Claude Code and Ollama, pulling and configuring various open‑source models, setting environment variables, and running Claude Code with both local and cloud‑hosted models, while covering context length, performance considerations, and tool‑calling examples.
Overview
Claude Code can now use Ollama as a backend, enabling local or cloud LLMs for private, cost‑free development.
Installation
Claude Code
curl -fsSL https://claude.ai/install.sh | bash irm https://claude.ai/install.ps1 | iex # PowerShell curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd # CMDVerify with claude --version .
Ollama
curl -fsSL https://ollama.com/install.sh | sh # macOS/Linux/WSLWindows users download the installer from https://ollama.com/download and run it. Verify with ollama --version . Ollama runs as a service at http://localhost:11434 .
Pull a model ollama pull qwen3-coder Other useful models: ollama pull qwen2.5-coder:7b – balanced 5 GB coding model ollama pull starcoder2:3b – compact 1.7 GB model ollama pull qwen2.5-coder:1.5b – lightweight 1 GB model ollama pull deepseek-coder:1.3b – smallest 776 MB (no tool calls)
Confirm with ollama list .
Configure Environment Variables
Point Claude Code to the local Ollama endpoint:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434PowerShell equivalent:
$env:ANTHROPIC_AUTH_TOKEN="ollama"
$env:ANTHROPIC_BASE_URL="http://localhost:11434"Persist these in .bashrc or .zshrc for future sessions.
Run Claude Code with a Model
Start Claude Code specifying the model name: claude --model qwen2.5-coder:7b If a model (e.g., deepseek-coder:1.3b) does not support tool calls, switch to a model that does, such as qwen2.5-coder:7b.
Context Length Considerations
Claude Code works best with models offering at least a 32k token window; 64k is ideal for large projects. Adjust the context via a custom Modelfile:
FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768Create the custom model and use it:
ollama create qwen-32k -f Modelfile
claude --model qwen-32kTypical context sizes:
2048 tokens – default for many small models
8192 tokens – simple tasks
32768 tokens – Claude Code minimum
65536 tokens – complex projects
Hardware Requirements & Performance
~16 GB RAM comfortably runs 7 b parameter models.
20 b+ models benefit from 32 GB + RAM.
Apple Silicon provides strong performance; older hardware may struggle with larger models.
7 b models are fast enough for day‑to‑day coding; 20 b models are slower but handle broader tasks such as documentation or architecture planning.
Using Ollama Cloud (Hosted Models)
When local resources are insufficient, register at https://ollama.com/cloud and obtain an API key. Set the variables:
export ANTHROPIC_BASE_URL=https://ollama.com
export ANTHROPIC_API_KEY=your-api-key-herePowerShell:
$env:ANTHROPIC_BASE_URL="https://ollama.com"
$env:ANTHROPIC_API_KEY="your-api-key-here"Run a cloud model by appending :cloud to the name, e.g.: claude --model glm-4.7:cloud Cloud models run with full context length and require no local storage.
Tool‑Calling Example
The following Python snippet shows a tool call (weather lookup) using the local Ollama endpoint:
import anthropic
client = anthropic.Anthropic(
base_url='http://localhost:11434',
api_key='ollama',
)
message = client.messages.create(
model='qwen3-coder',
max_tokens=1024,
tools=[{
'name': 'get_weather',
'description': 'Get the current weather in a location',
'input_schema': {
'type': 'object',
'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'}},
'required': ['location']
}
}],
messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)
for block in message.content:
if block.type == 'tool_use':
print(f'Tool: {block.name}')
print(f'Input: {block.input}')Key Takeaways
Integrating Claude Code with Ollama provides a flexible, privacy‑preserving workflow. Local models are ideal for private codebases and rapid iteration, while Ollama Cloud offers scalable performance for larger contexts or hardware‑limited setups.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
