Unlock Faster, Cheaper Claude Code with Domestic LLMs: 3 Practical Solutions
Discover three practical ways to replace costly, slow Claude Code API calls with domestic large‑language models—DeepSeek, Alibaba Cloud Bailei, and third‑party relay services—offering lower latency, dramatically reduced fees, step‑by‑step configuration, performance benchmarks, and troubleshooting tips for developers.
Solution 1: DeepSeek (Native Anthropic‑compatible API)
DeepSeek offers a direct Anthropic‑compatible endpoint, allowing Claude Code to connect without a proxy.
Step 1 – Obtain API key
Register at https://platform.deepseek.com, create an API key, and select the "Claude Code" group.
Step 2 – Set environment variables (Linux/macOS)
export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_AUTH_TOKEN=YOUR_DEEPSEEK_API_KEY
export ANTHROPIC_MODEL=deepseek-reasoner
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1Windows PowerShell:
$env:ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
$env:ANTHROPIC_AUTH_TOKEN="YOUR_DEEPSEEK_API_KEY"
$env:ANTHROPIC_MODEL="deepseek-reasoner"
$env:CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1Step 3 – Verify
cd your-project
claudeThe connection log should display deepseek-reasoner or deepseek-chat as the active model.
Key characteristics
Very low cost (near‑free for daily coding).
Strong reasoning capabilities, especially with deepseek-reasoner.
Low latency (< 1 s) due to domestic direct connection.
Good Chinese language support.
Solution 2: Alibaba Cloud Bailei + Tongyi Qianwen (Enterprise‑grade)
Bailei provides a commercial Coding Plan optimized for Claude Code and supports dedicated coding models.
Configuration (edit ~/.claude/settings.json)
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "YOUR_BAILEI_API_KEY",
"ANTHROPIC_BASE_URL": "https://coding.dashscope.aliyuncs.com/apps/anthropic",
"ANTHROPIC_MODEL": "qwen3-coder-plus",
"ANTHROPIC_SMALL_FAST_MODEL": "qwen-turbo"
}
}Model mapping recommendations
Daily coding – qwen3-coder-plus (comparable to Claude 3.5 Sonnet).
Fast completion – qwen-turbo (comparable to Claude 3 Haiku).
Complex architecture – qwen-max (comparable to Claude 3 Opus).
Measured performance (representative figures)
Code‑generation accuracy: DeepSeek 85 % vs Claude Sonnet 4.5 90 %.
Response latency: DeepSeek ≈ 400 ms vs Claude ≈ 2 s (requires VPN).
Cost: ¥0.6 per M tokens (DeepSeek) vs $15 per M tokens + VPN overhead (Claude).
Chinese variable‑name understanding: DeepSeek 95 % vs Claude 80 %.
Solution 3: Third‑party Relay Services (Flexible)
Relay services allow use of the native Claude model while bypassing network restrictions.
Typical providers
xingjiabiapi.org – lower price, latency 0.5‑1 s, success rate > 98 %.
claude-code.com.cn – supports latest Claude Opus 4.5/4.6, domestic nodes.
api.yixia.ai – simple one‑click setup.
Configuration example (xingjiabiapi.org)
export ANTHROPIC_BASE_URL=https://xingjiabiapi.org
export ANTHROPIC_API_KEY=sk-YOUR_KEYOr edit ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_API_KEY": "sk-YOUR_KEY",
"ANTHROPIC_BASE_URL": "https://xingjiabiapi.org",
"ANTHROPIC_MODEL": "claude-opus-4-5-20251101"
}
}Advanced Tip: Model Auto‑Switching
Claude Code can automatically select a model based on task complexity; manual configuration provides finer control.
Recommended environment configuration
{
"env": {
"ANTHROPIC_DEFAULT_OPUS_MODEL": "qwen-max",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "qwen3-coder-plus",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "qwen-turbo"
}
}Scenario mapping
File reading / simple syntax checks → qwen-turbo (fast, cheap).
Code generation / refactoring → qwen3-coder-plus (balanced).
Complex architecture design / deep reasoning → qwen-max (strong reasoning).
Quick switch command (enter in Claude Code)
/model qwen-maxCommon Troubleshooting
Invalid API Key – Ensure the key starts with sk- and has the required permissions.
Failed to connect to api.anthropic.com – Add CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 and verify hasCompletedOnboarding: true in the config file.
Poor code‑generation quality – Adjust the model temperature to a range of 0.2‑0.5; lower temperatures improve stability for domestic models.
Multi‑model orchestration – Use LiteLLM to build a local gateway that aggregates multiple model APIs (requires Python).
Old Meng AI Explorer
Tracking global AI developments 24/7, focusing on large model iterations, commercial applications, and tech ethics. We break down hardcore technology into plain language, providing fresh news, in-depth analysis, and practical insights for professionals and enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
