Breaking the Multi‑Robot Barrier: Sequential World‑Model Decomposition (ICLR 2026)

SeqWM introduces a sequential causal decomposition of joint dynamics, allowing each robot to model its marginal contribution conditioned on prior agents, which simplifies world‑model learning, enables intent‑sharing planning via MPPI, and achieves superior performance in challenging simulation benchmarks and real‑robot tests.

Machine Heart
Machine Heart
Machine Heart
Breaking the Multi‑Robot Barrier: Sequential World‑Model Decomposition (ICLR 2026)

Recent advances in decision‑coupled world models and model‑based reinforcement learning have enabled single‑robot planning, but extending these methods to multiple robots introduces joint dynamics that are hard to model.

The core difficulty is that the world is no longer driven by a single policy; multiple agents simultaneously affect the environment, leading to (1) complex causal structure and gradient conflicts, and (2) a broken decision‑world feedback loop where prediction errors accumulate.

Multi‑robot world changes can be modeled as robots acting sequentially on the environment.

Based on this observation, the SeqWM framework decomposes the joint dynamics into a series of sequentially conditioned state‑transition processes. Each robot learns its marginal causal contribution given the actions of preceding robots, turning the original joint dynamics into a chain of conditional predictions.

During trajectory prediction, every robot maintains an independent world model that predicts only its own effect on the environment, while conditioning on the predicted outcomes of earlier robots. This modular structure reduces modeling complexity and improves scalability.

For action planning, SeqWM employs Model Predictive Path Integral (MPPI) control. Robots plan in order, sharing their predicted trajectories so that later agents can incorporate the intents of earlier ones, achieving explicit intent sharing and stronger cooperation.

Simulation experiments were conducted on two challenging multi‑robot benchmarks: Bi‑DexHands (dual dexterous hand manipulation) and Multi‑Quadruped (cooperative quadruped tasks). In all tasks SeqWM outperformed existing baselines, achieving higher success rates and better sample efficiency.

Emergent cooperative behaviors were observed, such as predictive adaptation—where a robot anticipates a partner’s future actions—and role division—where robots naturally assume complementary functions without explicit design.

To validate sim‑to‑real transfer, SeqWM was deployed on a Unitree Go2‑W platform for tasks including box pushing, narrow‑door passage, and guiding a target robot. Real‑world results closely matched simulation, confirming practical applicability.

In summary, SeqWM offers a novel sequential causal decomposition for multi‑robot world modeling, simplifying learning, enabling intent‑shared planning, and delivering state‑of‑the‑art performance in both simulated and real environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

reinforcement learningmodel-based RLMPPImulti-robot cooperationsequential decompositionSeqWM
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.