13 min read

MetaAgent-X Enables Agents to Self‑Evolve: A New Paradigm for Native Collaboration

MetaAgent‑X integrates system design and execution within a single base model, using hierarchical rollout and stagewise co‑evolution to jointly train Designer and Executor roles, and achieves significant gains over single‑agent and prior multi‑agent baselines on math and code benchmarks.

Machine Learning Algorithms & Natural Language Processing

Jun 1, 2026

MetaAgent-X Enables Agents to Self‑Evolve: A New Paradigm for Native Collaboration

Recent multi‑agent systems have become a common paradigm for large‑model applications, but most approaches only orchestrate agents externally while keeping the execution model frozen, limiting overall robustness.

MetaAgent‑X addresses this limitation by asking whether a single base model can both learn to design a multi‑agent system and to execute tasks within that system, improving both abilities through reinforcement learning.

The framework splits the task into two roles. The Designer generates a lightweight Python script that specifies the required agents, their interaction order, tool usage, memory handling, and termination conditions. The Executor runs the instantiated system to perform reasoning, testing, collaboration, and correction on math or code problems.

Training follows a hierarchical rollout: for each input problem the Designer samples M candidate designs, each of which the Executor runs N times, forming a two‑level tree. Rewards from the environment are attributed separately to design and execution trajectories (Executor‑Designer Hierarchical Rollout). To avoid interference, MetaAgent‑X employs stagewise co‑evolution, alternating an Executor‑only phase (learning to solve the current designs) with a Designer‑only phase (learning to generate better designs from stable execution feedback). The two phases alternate every 30 steps in the main experiments.

Experiments use Qwen3‑4B and Qwen3‑8B as base models with shared parameters for both roles. Models are cold‑started with supervised fine‑tuning on 3 K design and 8 K execution samples generated by DeepSeek‑V3.2, then reinforced on mixed math (Polaris‑53K) and code (APPS, CodeContests) datasets. Benchmarks include AIME24, AIME25, OlympiadBench, APPS, LiveCodeBench‑v6, and CodeContests, and baselines cover single‑agent, GRPO, AFlow, ADAS, ScoreFlow, MaAS, and AFM‑Coder.

Results show consistent improvements: on Qwen3‑8B, MetaAgent‑X achieves an average score of 38.33 % (↑11.17 pp over the single‑agent baseline) and on Qwen3‑4B an average of 34.18 % (↑12.80 pp). It also surpasses strong multi‑agent baselines, e.g., MaAS’s 32.22 % → 38.33 % on the 8B model. Reinforcement learning further lifts performance beyond the SFT stage (8B: 32.17 % → 38.33 %).

Ablation studies reveal that hierarchical rollout with M=4, N=4 outperforms M=8, N=1, confirming the value of multiple executions per design. Stagewise co‑evolution beats simultaneous training, Designer‑only, or Executor‑only training, delivering the most stable and highest scores. Phase length analysis shows that a 30‑step stage is optimal; shorter phases cause collapse after ~150 steps. Shared‑parameter training outperforms separate‑parameter training on AIME benchmarks (40.0 % vs 33.3 %).

Analysis of generated structures after RL shows task‑aware routing: for difficult math problems the Designer selects a “reflection” structure (solver + critic) over 70 % of the time, while for code tasks it prefers a “single” structure to reduce overhead, and retains “ensemble” structures for certain competition‑style problems.

Limitations include evaluation only on 4 B and 8 B models and a lack of scaling studies to larger models, longer training budgets, or broader task families.

Overall, MetaAgent‑X demonstrates that end‑to‑end trainable multi‑agent systems can move beyond external workflow search, enabling foundation models to internally learn when and how to collaborate, a direction that may become central to future code assistants, research assistants, and general‑purpose intelligent agents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models multi‑agent systems reinforcement learning AI collaboration hierarchical rollout MetaAgent-X

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.