MetaAgent-X Enables Self‑Evolving Agents for Native Collaboration
MetaAgent-X tackles the limitation of fixed‑executor multi‑agent systems by jointly training a Designer that creates lightweight Python‑based collaboration scripts and an Executor that runs them, using hierarchical rollouts and stagewise co‑evolution to improve both design and execution across math and code benchmarks.
Recent multi‑agent systems often rely on external orchestration, which limits the underlying model’s ability to improve from task feedback because the executor remains frozen. MetaAgent‑X addresses this by training a single base model to both design a multi‑agent workflow and execute it.
The framework splits the process into two roles. Given a task, the Designer generates a lightweight Python script describing role division, interaction order, tool usage, memory handling, and termination conditions. The Executor then runs the instantiated system to perform reasoning, testing, collaboration, and correction. This design‑execute loop is recorded and used to update the model.
MetaAgent‑X introduces Executor‑Designer Hierarchical Rollout : for each problem the Designer samples M candidate designs, each executed N times, forming a two‑level tree. In the main experiments M=4, N=4 provide stable estimates of design quality by averaging rewards across multiple executions.
To avoid noisy credit assignment, the paper proposes Stagewise Co‑evolution . Designer and Executor are not trained simultaneously; instead, alternating phases first update the Executor using execution trajectories, then update the Designer using design trajectories. Experiments show that 30‑step phases yield the most stable training, while 1‑step switching causes collapse.
Training uses Qwen3 4B and Qwen3 8B as base models with a shared‑parameter strategy (the same parameters serve both roles, distinguished by prompts). A cold‑start SFT stage leverages 3 K Designer and 8 K Executor samples generated by DeepSeek‑V3.2, followed by reinforcement learning on mixed math (Polaris‑Dataset‑53K) and code (APPS, CodeContests) data.
Evaluation spans six benchmarks (AIME24, AIME25, OlympiadBench, APPS, LiveCodeBench‑v6, CodeContests). MetaAgent‑X improves average scores to 38.33 % (Qwen3 8B) and 34.18 % (Qwen3 4B), outperforming single‑agent baselines by 11–13 percentage points and beating strong multi‑agent baselines such as MaAS by 6 points.
Ablation studies reveal that hierarchical rollouts (M=4, N=4) outperform more design samples with fewer executions (M=8, N=1), and that stagewise co‑evolution outperforms simultaneous or single‑role training. Shared‑parameter training also surpasses separate‑parameter setups, indicating beneficial cross‑role knowledge transfer.
Post‑RL analysis shows the Designer adapts structures to task difficulty: reflection‑style pipelines dominate hard math problems, single‑agent pipelines are chosen for straightforward code tasks, and ensemble structures persist for competitive scenarios. Improvements stem from both better design choices and a more capable Executor.
MetaAgent‑X demonstrates that end‑to‑end training can endow foundation models with native multi‑agent abilities—knowing when to collaborate, how to organize agents, and how to refine designs from execution feedback—pointing toward future agentic systems that internalize orchestration rather than relying on external prompt engineering.
Limitations include evaluation only on 4B/8B models and a limited set of tasks; larger scales and broader domains remain to be explored.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
