ABot-M0: A Unified VLA Framework Solving the One‑Brain Many‑Forms Robotics Challenge

ABot-M0 is an open‑source Vision‑Language‑Action foundation model that unifies fragmented robot data, introduces Action Manifold Learning for smoother action prediction, and offers a plug‑and‑play dual‑stream perception architecture, achieving state‑of‑the‑art results on major manipulation benchmarks.

Amap Tech
Amap Tech
Amap Tech
ABot-M0: A Unified VLA Framework Solving the One‑Brain Many‑Forms Robotics Challenge

Overview

ABot-M0 is an open‑source unified framework designed to address the “one‑brain, many‑forms” problem in robotics, providing a Vision‑Language‑Action (VLA) foundation model that integrates perception and manipulation.

Key Contributions

UniACT dataset : Consolidates multiple fragmented robot manipulation datasets into a single, standardized training resource, enabling large‑scale, diverse data for better generalization.

Action Manifold Learning (AML) : Replaces noisy action prediction with direct prediction of smooth, physically plausible action trajectories on a low‑dimensional manifold, improving decoding speed and policy stability.

Plug‑and‑play dual‑stream perception : Combines a powerful VLM (Qwen3‑VL) for semantic understanding with an optional 3D module (e.g., VGGT) for geometric priors, allowing “what to do” and “where to do it” to be handled jointly without modifying the backbone.

Open‑source Release

The full code, data processing pipeline, and pretrained weights are released on GitHub ( https://github.com/amap-cvlab/ABot-Manipulation) together with a project website. The repository also links to the companion open‑source project starVLA for pretrained models.

Benchmark results show ABot‑M0 achieving first place on the LIBERO‑PLUS and RoboCasa‑GR1‑Tabletop leaderboards.

Problem Addressed

Current robot learning suffers from three major issues: insufficient data scale, inconsistent data quality, and non‑unified representations. ABot‑M0 provides an end‑to‑end pipeline that transforms heterogeneous raw data into efficient, generalizable policies, making cross‑hardware and cross‑task embodied intelligence feasible.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

embodied AIroboticsfoundation modelaction manifold learning
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.