How to Integrate Claude Code with Ollama for Local and Cloud LLM Workflows

This guide walks you through installing Claude Code and Ollama, pulling and configuring various open‑source models, setting environment variables, and running Claude Code with both local and cloud‑hosted models, while covering context length, performance considerations, and tool‑calling examples.

Code Mala Tang
Code Mala Tang
Code Mala Tang
How to Integrate Claude Code with Ollama for Local and Cloud LLM Workflows

Overview

Claude Code can now use Ollama as a backend, enabling local or cloud LLMs for private, cost‑free development.

Installation

Claude Code

curl -fsSL https://claude.ai/install.sh | bash
irm https://claude.ai/install.ps1 | iex   # PowerShell
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd   # CMD

Verify with claude --version .

Ollama

curl -fsSL https://ollama.com/install.sh | sh   # macOS/Linux/WSL

Windows users download the installer from https://ollama.com/download and run it. Verify with ollama --version . Ollama runs as a service at http://localhost:11434 .

Pull a model ollama pull qwen3-coder Other useful models: ollama pull qwen2.5-coder:7b – balanced 5 GB coding model ollama pull starcoder2:3b – compact 1.7 GB model ollama pull qwen2.5-coder:1.5b – lightweight 1 GB model ollama pull deepseek-coder:1.3b – smallest 776 MB (no tool calls)

Confirm with ollama list .

Configure Environment Variables

Point Claude Code to the local Ollama endpoint:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

PowerShell equivalent:

$env:ANTHROPIC_AUTH_TOKEN="ollama"
$env:ANTHROPIC_BASE_URL="http://localhost:11434"

Persist these in .bashrc or .zshrc for future sessions.

Run Claude Code with a Model

Start Claude Code specifying the model name: claude --model qwen2.5-coder:7b If a model (e.g., deepseek-coder:1.3b) does not support tool calls, switch to a model that does, such as qwen2.5-coder:7b.

Context Length Considerations

Claude Code works best with models offering at least a 32k token window; 64k is ideal for large projects. Adjust the context via a custom Modelfile:

FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768

Create the custom model and use it:

ollama create qwen-32k -f Modelfile
claude --model qwen-32k

Typical context sizes:

2048 tokens – default for many small models

8192 tokens – simple tasks

32768 tokens – Claude Code minimum

65536 tokens – complex projects

Hardware Requirements & Performance

~16 GB RAM comfortably runs 7 b parameter models.

20 b+ models benefit from 32 GB + RAM.

Apple Silicon provides strong performance; older hardware may struggle with larger models.

7 b models are fast enough for day‑to‑day coding; 20 b models are slower but handle broader tasks such as documentation or architecture planning.

Using Ollama Cloud (Hosted Models)

When local resources are insufficient, register at https://ollama.com/cloud and obtain an API key. Set the variables:

export ANTHROPIC_BASE_URL=https://ollama.com
export ANTHROPIC_API_KEY=your-api-key-here

PowerShell:

$env:ANTHROPIC_BASE_URL="https://ollama.com"
$env:ANTHROPIC_API_KEY="your-api-key-here"

Run a cloud model by appending :cloud to the name, e.g.: claude --model glm-4.7:cloud Cloud models run with full context length and require no local storage.

Tool‑Calling Example

The following Python snippet shows a tool call (weather lookup) using the local Ollama endpoint:

import anthropic
client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',
)
message = client.messages.create(
    model='qwen3-coder',
    max_tokens=1024,
    tools=[{
        'name': 'get_weather',
        'description': 'Get the current weather in a location',
        'input_schema': {
            'type': 'object',
            'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'}},
            'required': ['location']
        }
    }],
    messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)
for block in message.content:
    if block.type == 'tool_use':
        print(f'Tool: {block.name}')
        print(f'Input: {block.input}')

Key Takeaways

Integrating Claude Code with Ollama provides a flexible, privacy‑preserving workflow. Local models are ideal for private codebases and rapid iteration, while Ollama Cloud offers scalable performance for larger contexts or hardware‑limited setups.

LLM integrationTool CallingOllamaenvironment variablesClaude Codelocal modelscloud models
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.