22 min read

How to Build a Cost‑Efficient Multi‑AI Team with Claude Code

This article details a hands‑on experiment that turns Claude Code into a virtual AI team—splitting project‑manager, designer, programmer and QA roles into separate agents, using file‑based communication, strict CLAUDE.md contracts, and token‑saving techniques such as timestamp checks and model‑specific task routing.

Rare Earth Juejin Tech Community

Mar 11, 2026

How to Build a Cost‑Efficient Multi‑AI Team with Claude Code

❝ This technical exploration builds a multi‑AI framework on Claude Code where each AI fulfills a dedicated role (project manager, designer, programmer, QA) and collaborates through shared JSON files. The design also introduces token‑cost controls. ❞

Motivation

Using a single Claude instance for a complex project quickly leads to:

Context bloat – the prompt grows with every interaction.

Token waste – repeated analysis of already‑known information.

Role confusion – the agent must act as manager, designer, coder, and tester simultaneously.

Separating responsibilities into independent agents solves these problems.

Architecture

Four agents are defined, each in its own sub‑directory under agentGroup/:

agentGroup/
├── max/      # project‑manager AI
│   ├── CLAUDE.md      # persona & rules
│   └── skills/        # optional skill packages
├── ella/     # UI/UX designer AI
│   ├── CLAUDE.md
│   └── skills/
├── jarvis/   # programmer AI
│   ├── CLAUDE.md
│   └── skills/
├── kyle/     # QA engineer AI
│   ├── CLAUDE.md
│   └── skills/
└── shared/   # communication hub
    ├── status.json
    ├── notifications.json
    ├── tasks/
    ├── docs/
    ├── designs/
    └── reviews/

Each agent runs in an isolated Claude project instance ( claude --project <name>) and reads/writes the JSON files in shared/ to exchange information.

File‑based communication

Typical shared files:

// shared/status.json – team status board
{
  "current_task": "Develop personal website",
  "notifications": [],
  "last_updated": "2026-02-14T15:45:00Z",
  "completed_tasks": ["需求分析", "原型设计"]
}

// shared/notifications.json – internal message system
{
  "notifications": [
    {
      "from": "max",
      "to": "jarvis",
      "subject": "紧急Bug修复",
      "content": {
        "file": "frontend/LoginForm.vue",
        "issue": "登录按钮点击无响应",
        "hint": "检查handleLogin方法"
      }
    }
  ]
}

Benefits of this approach:

Zero configuration – no databases or message brokers.

Native support – Claude can read/write JSON directly.

Version control – all files live in Git, enabling rollback.

Interaction example

Update shared/notifications.json with a new task.

Update shared/status.json to record the task.

Agent max replies: "已通知贾维斯，任务已记录".

When the user switches to jarvis, it reads the notification, acknowledges the bug, and starts processing.

Ensuring deterministic behavior

After a restart, agents lost their workflow (e.g., max stopped performing the initial scope check). The fix is to embed a mandatory checkpoint contract in each CLAUDE.md file.

## ⚡ 强制流程（不可绕过）

**收到用户消息后必须按以下顺序执行**

0️⃣ 任务范围确认: "📋 任务范围确认: [明确/需澄清]"
1️⃣ 策略读取: 必须使用 <code>Read</code> 工具读取 <code>token-optimization.md</code>
2️⃣ 通知检查: 运行 <code>check_notifications_simple.sh</code>
3️⃣ 任务分解: 判断是否需要拆分子任务
4️⃣ Skill 检查: 评估是否有可用专业技能
5️⃣ 执行选择: 选择模型并决定执行方式
6️⃣ Git 安全: 检测是否需要 Git 操作授权

A self‑monitoring guard re‑executes the full chain if any step is skipped:

## 自我监控协议
IF (any checkpoint skipped) THEN {
  🛑 STOP current operation
  🔴 OUTPUT "⚠️ 检测到流程违规，正在强制纠正..."
  ✅ RE‑EXECUTE all checkpoints
}

Token optimization – two‑layer strategy

Layer 1: Timestamp check

Repeatedly reading unchanged files wastes tokens. A shell script compares the file’s modification time (mtime) with a cached value and skips the read when unchanged, saving ~97 % of read‑related tokens.

current_mtime=$(stat -f %m "$NOTIFICATIONS_FILE" 2>/dev/null || echo "0")
last_mtime=$(cat "$CACHE_FILE" 2>/dev/null || echo "0")
if [ "$current_mtime" = "$last_mtime" ]; then
  echo "文件未变化，跳过读取"
  exit 0   # 0‑Token
else
  echo "文件已更新，需要读取"
  echo "$current_mtime" > "$CACHE_FILE"
  exit 1   # trigger read
fi

Layer 2: Model‑specific task routing

Claude Code’s Task tool can specify the model per sub‑task. Splitting a large report into three subtasks reduces cost from ~0.24 $ to ~0.13 $ (≈46 % saving):

# Inefficient – all Sonnet (≈ $0.24)
Task(prompt="分析这个系统架构，找出问题，生成报告")

# Optimized – mixed models (≈ $0.13)
Task(model="haiku",  prompt="从代码中提取所有 API 端点和数据库表")   # data extraction
Task(model="sonnet", prompt="分析架构设计问题和性能瓶颈")          # deep analysis
Task(model="haiku",  prompt="把分析结果格式化成规范报告")          # formatting

Typical model recommendations:

Haiku – pure data extraction, format conversion, simple validation.

Sonnet – logical analysis, design review, debugging.

Opus – innovative design or strategic decisions.

Cost predictability

Traditional multi‑agent setups hide token consumption and can spike unexpectedly. The agentGroup approach provides:

Predictable token usage per step.

Explicit model selection per sub‑task.

Real‑time token breakdown for each interaction.

User‑driven optimization loops.

Limitations

Passive notifications – users must poll /status to see updates.

File dependency – a corrupted JSON file breaks the workflow.

Learning curve – newcomers need to understand four roles and the custom CLI.

Response latency – file I/O adds overhead compared with direct API calls.

Comparison with Claude’s official Agent Team

Collaboration method: file system vs. native API calls.

Notification: passive polling vs. active push.

Closed‑loop: manual intervention required vs. automatic.

Customizability: fully controllable vs. platform‑imposed limits.

Cost control: fine‑grained token budgeting vs. standard pricing.

Technical barrier: configuration needed vs. out‑of‑the‑box.

When to use this architecture

Long‑running projects that need persistent state.

Token‑sensitive workloads where every token matters.

Scenarios requiring deep customization of AI behavior.

Learning how multi‑AI collaboration works under the hood.

It is less suitable for one‑off quick tasks, real‑time collaboration, or latency‑critical applications.

Getting started

Create a single Claude instance and define a dedicated CLAUDE.md persona.

Add a JSON status file to replace repetitive queries.

Write a simple shell script that checks file mtimes before reading.

Scale by adding more agents, designing a notification schema, and instrumenting token‑monitoring scripts.

Conclusion

Specializing AI agents and enforcing observable contracts dramatically reduces token waste (up to 85 % in some scenarios) and yields reliable, cost‑predictable behavior. The framework demonstrates that AI is most valuable when treated as a set of specialized tools rather than a universal assistant.

Project repository: https://github.com/yezannnnn/agentGroup

workflow automation Claude Code Token Optimization AI multi‑agent file-based communication

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Motivation

Architecture

File‑based communication

Interaction example

Ensuring deterministic behavior

Token optimization – two‑layer strategy

Layer 1: Timestamp check

Layer 2: Model‑specific task routing

Cost predictability

Limitations

Comparison with Claude’s official Agent Team

When to use this architecture

Getting started

Conclusion

Rare Earth Juejin Tech Community

How this landed with the community

Was this worth your time?

0 Comments

Layer 1: Timestamp check

Layer 2: Model‑specific task routing