Artificial Intelligence 9 min read

Cut Token Costs by 68% with Dynamic Multi‑Agent Collaborative Coding

The paper introduces AgentConductor, a 3‑billion‑parameter orchestrator that generates adaptive YAML‑based multi‑agent topologies, dynamically re‑plans when code errors occur, achieving a 14.6% accuracy boost and up to 68% token‑cost reduction compared to existing static agent pipelines.

Machine Heart

Apr 5, 2026

Cut Token Costs by 68% with Dynamic Multi‑Agent Collaborative Coding

Amid the rise of Vibe Coding, software development is moving from "human writes code" to "human directs agents to write code". Systems like Claude Code and OpenClaw let agents handle coding, debugging, and full task flows, but single‑model limits remain for system‑level or competition‑level problems, prompting a shift toward multi‑agent collaboration.

Current approaches fall into two categories. The "Agent Teams" style (e.g., Claude Code) runs multiple models in parallel to raise capability ceilings, incurring extremely high token costs. The "skill‑composition and workflow orchestration" style (e.g., OpenClaw) offers more controllable engineering but still relies on predefined rules or static processes.

These static structures solve "how to organize calls" rather than "how to adapt collaboration to the task", leading to redundant agent communication, massive token consumption, and high autonomous programming costs.

The Shanghai Jiao Tong University i‑WiN team proposes AgentConductor, a 3 B‑parameter commander trained via reinforcement learning. It first assesses task difficulty and then generates a YAML‑encoded interaction topology: simple tasks receive lightweight teams, while complex tasks receive richer graphs, achieving adaptive matching of ability and cost.

AgentConductor is not a one‑shot planner. When generated code fails, the commander ingests the error message and its historical trajectory, then end‑to‑end regenerates the topology, exploring new collaboration patterns. Experiments show a 14.6% increase in code‑generation accuracy and a 68% reduction in token usage.

Key innovations include a YAML‑based topology representation that is readable, programmatically verifiable, and directly generable by LLMs, and an interaction form that blends chain, tree, and fully‑connected advantages, supporting intra‑layer parallelism and cross‑layer communication while allowing any agent to link to previous nodes, thus reducing unnecessary communication overhead.

The training paradigm uses two stages. First, supervised fine‑tuning (SFT) on 4,500 high‑quality topologies generated by GPT‑4o provides a topology prior across three difficulty levels. Second, a GRPO‑based multi‑round RL stage treats error feedback and topology text as a trajectory, optimizing a composite reward that balances token cost and code quality.

A topology‑density evaluation function maps task difficulty to token cost and graph density, formally accounting for node count, edge density, and graph depth (d) relative to maximum prompt length (m). This contrasts with traditional methods that approximate interaction density merely by matrix rank, losing multi‑agent semantics.

Experiments on three competition‑level benchmarks (APPS, LiveCodeBench, CodeContests) and two basic code datasets (HumanEval, MBPP) using Qwen‑2.5‑3B‑Instruct demonstrate that AgentConductor surpasses the strongest baselines on APPS, cuts completion token consumption by up to 68%, and attains the highest topology sparsity. The system adapts granularity: easy tasks use 3–4 nodes, hard tasks expand to 8–10 nodes, whereas most baselines keep a fixed density.

The work concludes that multi‑agent systems should be treated as learnable, evolvable structured decision processes. By unifying task difficulty, execution feedback, and token cost within a reinforcement‑learning framework, AgentConductor achieves simultaneous gains in accuracy and efficiency, marking a transition from static workflows to dynamic ecosystems.

Paper: "AgentConductor: Topology Evolution for Multi‑Agent Competition‑Level Code Generation" (arXiv:2602.17100).

Reinforcement learning Multi‑agent collaboration LLM code generation AgentConductor code generation benchmarks token cost reduction YAML topology