How Sequential World Models Enable Scalable Multi‑Robot Cooperation

SeqWM introduces a sequential causal decomposition of multi‑robot dynamics, allowing each robot to model its marginal contribution conditioned on preceding agents, which simplifies learning, improves sample efficiency, and yields natural collaborative behaviors both in simulation (Bi‑DexHands, Multi‑Quadruped) and real‑world tests on Unitree Go2‑W, outperforming prior methods.

Data Party THU
Data Party THU
Data Party THU
How Sequential World Models Enable Scalable Multi‑Robot Cooperation

Background

Decision‑Coupled World Models and model‑based reinforcement learning achieve strong results for single‑robot tasks, but extending them to multiple robots introduces a joint dynamics problem: the environment changes are caused by several agents simultaneously, making the dynamics hard to learn.

SeqWM: Sequential World Model

SeqWM treats the multi‑robot transition as a sequence of conditional updates. Each robot learns a marginal world model that predicts the environment change given the actions of all preceding robots.

The evolution of a multi‑robot world can be modeled as robots acting on the environment one after another.

Sequential Causal Decomposition

The joint dynamics p(s′|s,a₁,…,aₙ) is factorized as ∏ₖ p(sₖ′|sₖ, aₖ, a₁,…,aₖ₋₁). This reduces a high‑dimensional prediction problem into a series of simpler conditional predictions.

Trajectory Prediction

Each robot maintains an independent world model.

The model captures only the robot’s marginal contribution to the environment.

Later robots condition their predictions on the earlier robots’ predicted trajectories.

Planning with MPPI

SeqWM employs Model‑Predictive Path Integral (MPPI) control. Robots plan sequentially, share their predicted trajectories, and thus achieve explicit intent sharing that enhances coordination.

Experimental Evaluation

Benchmarks used:

Bi‑DexHands : dual‑hand dexterous manipulation.

Multi‑Quadruped : cooperative quadruped navigation.

Across all tasks SeqWM consistently outperformed prior baselines in success rate and sample efficiency.

Emergent Collaborative Behaviors

Predictive Adaptation : robots anticipate partners’ future actions and adjust early (e.g., moving to the predicted ball landing spot).

Role Division : in a box‑pushing task, one robot provides the main force while the other fine‑tunes direction, without explicit programming.

Sim‑to‑Real Validation

SeqWM was deployed on a Unitree Go2‑W platform for tasks such as box pushing, narrow‑gate passage, and leader‑follower navigation. The emergent collaborative behaviors matched simulation results, confirming real‑world applicability.

Code repository: https://github.com/zhaozijie2022/seqwm

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

simulationworld modelsequentialmulti-robotreinforcement-learningreal-robot
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.