Artificial Intelligence 8 min read

Unlock Faster, Cheaper Claude Code with Domestic LLMs: 3 Practical Solutions

Discover three practical ways to replace costly, slow Claude Code API calls with domestic large‑language models—DeepSeek, Alibaba Cloud Bailei, and third‑party relay services—offering lower latency, dramatically reduced fees, step‑by‑step configuration, performance benchmarks, and troubleshooting tips for developers.

Old Meng AI Explorer

Apr 3, 2026

Unlock Faster, Cheaper Claude Code with Domestic LLMs: 3 Practical Solutions

Solution 1: DeepSeek (Native Anthropic‑compatible API)

DeepSeek offers a direct Anthropic‑compatible endpoint, allowing Claude Code to connect without a proxy.

Step 1 – Obtain API key

Step 2 – Set environment variables (Linux/macOS)

export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_AUTH_TOKEN=YOUR_DEEPSEEK_API_KEY
export ANTHROPIC_MODEL=deepseek-reasoner
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Windows PowerShell:

$env:ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
$env:ANTHROPIC_AUTH_TOKEN="YOUR_DEEPSEEK_API_KEY"
$env:ANTHROPIC_MODEL="deepseek-reasoner"
$env:CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Step 3 – Verify

cd your-project
claude

The connection log should display deepseek-reasoner or deepseek-chat as the active model.

Key characteristics

Very low cost (near‑free for daily coding).

Strong reasoning capabilities, especially with deepseek-reasoner.

Low latency (< 1 s) due to domestic direct connection.

Good Chinese language support.

Solution 2: Alibaba Cloud Bailei + Tongyi Qianwen (Enterprise‑grade)

Bailei provides a commercial Coding Plan optimized for Claude Code and supports dedicated coding models.

Configuration (edit ~/.claude/settings.json)

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "YOUR_BAILEI_API_KEY",
    "ANTHROPIC_BASE_URL": "https://coding.dashscope.aliyuncs.com/apps/anthropic",
    "ANTHROPIC_MODEL": "qwen3-coder-plus",
    "ANTHROPIC_SMALL_FAST_MODEL": "qwen-turbo"
  }
}

Model mapping recommendations

Daily coding – qwen3-coder-plus (comparable to Claude 3.5 Sonnet).

Fast completion – qwen-turbo (comparable to Claude 3 Haiku).

Complex architecture – qwen-max (comparable to Claude 3 Opus).

Measured performance (representative figures)

Code‑generation accuracy: DeepSeek 85 % vs Claude Sonnet 4.5 90 %.

Response latency: DeepSeek ≈ 400 ms vs Claude ≈ 2 s (requires VPN).

Cost: ¥0.6 per M tokens (DeepSeek) vs $15 per M tokens + VPN overhead (Claude).

Chinese variable‑name understanding: DeepSeek 95 % vs Claude 80 %.

Solution 3: Third‑party Relay Services (Flexible)

Relay services allow use of the native Claude model while bypassing network restrictions.

Typical providers

xingjiabiapi.org – lower price, latency 0.5‑1 s, success rate > 98 %.

claude-code.com.cn – supports latest Claude Opus 4.5/4.6, domestic nodes.

api.yixia.ai – simple one‑click setup.

Configuration example (xingjiabiapi.org)

export ANTHROPIC_BASE_URL=https://xingjiabiapi.org
export ANTHROPIC_API_KEY=sk-YOUR_KEY

Or edit ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_API_KEY": "sk-YOUR_KEY",
    "ANTHROPIC_BASE_URL": "https://xingjiabiapi.org",
    "ANTHROPIC_MODEL": "claude-opus-4-5-20251101"
  }
}

Advanced Tip: Model Auto‑Switching

Claude Code can automatically select a model based on task complexity; manual configuration provides finer control.

Recommended environment configuration

{
  "env": {
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "qwen-max",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "qwen3-coder-plus",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "qwen-turbo"
  }
}

Scenario mapping

File reading / simple syntax checks → qwen-turbo (fast, cheap).

Code generation / refactoring → qwen3-coder-plus (balanced).

Complex architecture design / deep reasoning → qwen-max (strong reasoning).

Quick switch command (enter in Claude Code)

/model qwen-max

Common Troubleshooting

Invalid API Key – Ensure the key starts with sk- and has the required permissions.

Failed to connect to api.anthropic.com – Add CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 and verify hasCompletedOnboarding: true in the config file.

Poor code‑generation quality – Adjust the model temperature to a range of 0.2‑0.5; lower temperatures improve stability for domestic models.

Multi‑model orchestration – Use LiteLLM to build a local gateway that aggregates multiple model APIs (requires Python).

AI coding cost optimization DeepSeek model selection environment variables Claude Code

Written by

Old Meng AI Explorer

Tracking global AI developments 24/7, focusing on large model iterations, commercial applications, and tech ethics. We break down hardcore technology into plain language, providing fresh news, in-depth analysis, and practical insights for professionals and enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.