How Sequential World Models Enable Scalable Multi‑Robot Cooperation
SeqWM introduces a sequential causal decomposition of multi‑robot dynamics, allowing each robot to model its marginal contribution conditioned on preceding agents, which simplifies learning, improves sample efficiency, and yields natural collaborative behaviors both in simulation (Bi‑DexHands, Multi‑Quadruped) and real‑world tests on Unitree Go2‑W, outperforming prior methods.
Background
Decision‑Coupled World Models and model‑based reinforcement learning achieve strong results for single‑robot tasks, but extending them to multiple robots introduces a joint dynamics problem: the environment changes are caused by several agents simultaneously, making the dynamics hard to learn.
SeqWM: Sequential World Model
SeqWM treats the multi‑robot transition as a sequence of conditional updates. Each robot learns a marginal world model that predicts the environment change given the actions of all preceding robots.
The evolution of a multi‑robot world can be modeled as robots acting on the environment one after another.
Sequential Causal Decomposition
The joint dynamics p(s′|s,a₁,…,aₙ) is factorized as ∏ₖ p(sₖ′|sₖ, aₖ, a₁,…,aₖ₋₁). This reduces a high‑dimensional prediction problem into a series of simpler conditional predictions.
Trajectory Prediction
Each robot maintains an independent world model.
The model captures only the robot’s marginal contribution to the environment.
Later robots condition their predictions on the earlier robots’ predicted trajectories.
Planning with MPPI
SeqWM employs Model‑Predictive Path Integral (MPPI) control. Robots plan sequentially, share their predicted trajectories, and thus achieve explicit intent sharing that enhances coordination.
Experimental Evaluation
Benchmarks used:
Bi‑DexHands : dual‑hand dexterous manipulation.
Multi‑Quadruped : cooperative quadruped navigation.
Across all tasks SeqWM consistently outperformed prior baselines in success rate and sample efficiency.
Emergent Collaborative Behaviors
Predictive Adaptation : robots anticipate partners’ future actions and adjust early (e.g., moving to the predicted ball landing spot).
Role Division : in a box‑pushing task, one robot provides the main force while the other fine‑tunes direction, without explicit programming.
Sim‑to‑Real Validation
SeqWM was deployed on a Unitree Go2‑W platform for tasks such as box pushing, narrow‑gate passage, and leader‑follower navigation. The emergent collaborative behaviors matched simulation results, confirming real‑world applicability.
Code repository: https://github.com/zhaozijie2022/seqwm
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
