Nov 16, 2025 · Artificial Intelligence

How X‑VLA Enables 120‑Minute Unassisted Robot Clothing Folding with a 0.9B Model

The X‑VLA paper introduces a 0.9‑billion‑parameter, fully open‑source embodied model that uses a learnable soft‑prompt and divide‑and‑conquer encoding to handle heterogeneous robot vision inputs, achieving a record‑breaking 120‑minute autonomous clothing‑folding task while surpassing benchmarks across five simulation environments.

Embodied AIMultimodal LearningX-VLA

0 likes · 7 min read

How X‑VLA Enables 120‑Minute Unassisted Robot Clothing Folding with a 0.9B Model

flow-matching

How X‑VLA Enables 120‑Minute Unassisted Robot Clothing Folding with a 0.9B Model