How to Build a Cost‑Efficient Multi‑AI Team with Claude Code
This article details a hands‑on experiment that turns Claude Code into a virtual AI team—splitting project‑manager, designer, programmer and QA roles into separate agents, using file‑based communication, strict CLAUDE.md contracts, and token‑saving techniques such as timestamp checks and model‑specific task routing.
❝ This technical exploration builds a multi‑AI framework on Claude Code where each AI fulfills a dedicated role (project manager, designer, programmer, QA) and collaborates through shared JSON files. The design also introduces token‑cost controls. ❞
Motivation
Using a single Claude instance for a complex project quickly leads to:
Context bloat – the prompt grows with every interaction.
Token waste – repeated analysis of already‑known information.
Role confusion – the agent must act as manager, designer, coder, and tester simultaneously.
Separating responsibilities into independent agents solves these problems.
Architecture
Four agents are defined, each in its own sub‑directory under agentGroup/:
agentGroup/
├── max/ # project‑manager AI
│ ├── CLAUDE.md # persona & rules
│ └── skills/ # optional skill packages
├── ella/ # UI/UX designer AI
│ ├── CLAUDE.md
│ └── skills/
├── jarvis/ # programmer AI
│ ├── CLAUDE.md
│ └── skills/
├── kyle/ # QA engineer AI
│ ├── CLAUDE.md
│ └── skills/
└── shared/ # communication hub
├── status.json
├── notifications.json
├── tasks/
├── docs/
├── designs/
└── reviews/Each agent runs in an isolated Claude project instance ( claude --project <name>) and reads/writes the JSON files in shared/ to exchange information.
File‑based communication
Typical shared files:
// shared/status.json – team status board
{
"current_task": "Develop personal website",
"notifications": [],
"last_updated": "2026-02-14T15:45:00Z",
"completed_tasks": ["需求分析", "原型设计"]
} // shared/notifications.json – internal message system
{
"notifications": [
{
"from": "max",
"to": "jarvis",
"subject": "紧急Bug修复",
"content": {
"file": "frontend/LoginForm.vue",
"issue": "登录按钮点击无响应",
"hint": "检查handleLogin方法"
}
}
]
}Benefits of this approach:
Zero configuration – no databases or message brokers.
Native support – Claude can read/write JSON directly.
Version control – all files live in Git, enabling rollback.
Interaction example
Update shared/notifications.json with a new task.
Update shared/status.json to record the task.
Agent max replies: "已通知贾维斯,任务已记录".
When the user switches to jarvis, it reads the notification, acknowledges the bug, and starts processing.
Ensuring deterministic behavior
After a restart, agents lost their workflow (e.g., max stopped performing the initial scope check). The fix is to embed a mandatory checkpoint contract in each CLAUDE.md file.
## ⚡ 强制流程(不可绕过)
**收到用户消息后必须按以下顺序执行**
0️⃣ 任务范围确认: "📋 任务范围确认: [明确/需澄清]"
1️⃣ 策略读取: 必须使用 <code>Read</code> 工具读取 <code>token-optimization.md</code>
2️⃣ 通知检查: 运行 <code>check_notifications_simple.sh</code>
3️⃣ 任务分解: 判断是否需要拆分子任务
4️⃣ Skill 检查: 评估是否有可用专业技能
5️⃣ 执行选择: 选择模型并决定执行方式
6️⃣ Git 安全: 检测是否需要 Git 操作授权A self‑monitoring guard re‑executes the full chain if any step is skipped:
## 自我监控协议
IF (any checkpoint skipped) THEN {
🛑 STOP current operation
🔴 OUTPUT "⚠️ 检测到流程违规,正在强制纠正..."
✅ RE‑EXECUTE all checkpoints
}Token optimization – two‑layer strategy
Layer 1: Timestamp check
Repeatedly reading unchanged files wastes tokens. A shell script compares the file’s modification time (mtime) with a cached value and skips the read when unchanged, saving ~97 % of read‑related tokens.
current_mtime=$(stat -f %m "$NOTIFICATIONS_FILE" 2>/dev/null || echo "0")
last_mtime=$(cat "$CACHE_FILE" 2>/dev/null || echo "0")
if [ "$current_mtime" = "$last_mtime" ]; then
echo "文件未变化,跳过读取"
exit 0 # 0‑Token
else
echo "文件已更新,需要读取"
echo "$current_mtime" > "$CACHE_FILE"
exit 1 # trigger read
fiLayer 2: Model‑specific task routing
Claude Code’s Task tool can specify the model per sub‑task. Splitting a large report into three subtasks reduces cost from ~0.24 $ to ~0.13 $ (≈46 % saving):
# Inefficient – all Sonnet (≈ $0.24)
Task(prompt="分析这个系统架构,找出问题,生成报告")
# Optimized – mixed models (≈ $0.13)
Task(model="haiku", prompt="从代码中提取所有 API 端点和数据库表") # data extraction
Task(model="sonnet", prompt="分析架构设计问题和性能瓶颈") # deep analysis
Task(model="haiku", prompt="把分析结果格式化成规范报告") # formattingTypical model recommendations:
Haiku – pure data extraction, format conversion, simple validation.
Sonnet – logical analysis, design review, debugging.
Opus – innovative design or strategic decisions.
Cost predictability
Traditional multi‑agent setups hide token consumption and can spike unexpectedly. The agentGroup approach provides:
Predictable token usage per step.
Explicit model selection per sub‑task.
Real‑time token breakdown for each interaction.
User‑driven optimization loops.
Limitations
Passive notifications – users must poll /status to see updates.
File dependency – a corrupted JSON file breaks the workflow.
Learning curve – newcomers need to understand four roles and the custom CLI.
Response latency – file I/O adds overhead compared with direct API calls.
Comparison with Claude’s official Agent Team
Collaboration method: file system vs. native API calls.
Notification: passive polling vs. active push.
Closed‑loop: manual intervention required vs. automatic.
Customizability: fully controllable vs. platform‑imposed limits.
Cost control: fine‑grained token budgeting vs. standard pricing.
Technical barrier: configuration needed vs. out‑of‑the‑box.
When to use this architecture
Long‑running projects that need persistent state.
Token‑sensitive workloads where every token matters.
Scenarios requiring deep customization of AI behavior.
Learning how multi‑AI collaboration works under the hood.
It is less suitable for one‑off quick tasks, real‑time collaboration, or latency‑critical applications.
Getting started
Create a single Claude instance and define a dedicated CLAUDE.md persona.
Add a JSON status file to replace repetitive queries.
Write a simple shell script that checks file mtimes before reading.
Scale by adding more agents, designing a notification schema, and instrumenting token‑monitoring scripts.
Conclusion
Specializing AI agents and enforcing observable contracts dramatically reduces token waste (up to 85 % in some scenarios) and yields reliable, cost‑predictable behavior. The framework demonstrates that AI is most valuable when treated as a set of specialized tools rather than a universal assistant.
Project repository: https://github.com/yezannnnn/agentGroup
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
