Open-Source XR-1: China’s First Embodied VLA Model for Robots
Beijing Humanoid Robot Innovation Center has open‑sourced XR‑1, the nation’s first VLA (vision‑language‑action) model that meets embodied‑intelligence standards, along with its supporting data sets RoboMIND 2.0 and ArtVIP, detailing its three‑stage training paradigm and cross‑modal capabilities.
XR‑1 VLA Model Overview
XR‑1 is an open‑source vision‑language‑action (VLA) large model for embodied robotics, the first in China to pass the national embodied‑intelligence standard. It supports multiple robot bodies, scenes and tasks with strong generalization.
Technical Foundations
The model is built on three pillars:
Cross‑data‑source learning : leverages a large collection of human‑recorded videos (over one million multi‑body virtual and real samples) to reduce training cost and improve data efficiency.
Cross‑modal alignment : aligns visual perception with motor actions, enabling knowledge‑action integration.
Cross‑body control : abstracts control policies so the same model can be deployed on different robot platforms and brands.
Three‑Stage Training Paradigm
Stage 1 – Dictionary Construction : ingest the multi‑body video corpus and compress complex scenes and motions into a discrete “action code” dictionary for fast retrieval.
Stage 2 – Cross‑body Pre‑training : train on the large‑scale robot dataset to learn basic physical laws (e.g., objects fall when released, doors open when pushed).
Stage 3 – Task‑specific Fine‑tuning : use a small amount of task‑oriented data (e.g., sorting, box moving, clothing folding) to adapt the model to concrete applications.
Data Foundations
XR‑1 is accompanied by two datasets:
RoboMIND 2.0 : a multi‑body robot data collection that provides the video samples used in Stage 1 and Stage 2.
ArtVIP : a high‑quality visual‑action dataset hosted on HuggingFace, used to improve cross‑modal alignment.
Resources
Repository and dataset links:
XR‑1 GitHub: https://github.com/Open-X-Humanoid/XR-1 RoboMIND 2.0: https://modelscope.cn/collections/X-Humanoid/RoboMIND20 ArtVIP dataset:
https://huggingface.co/datasets/x-humanoid-robomind/ArtVIP21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
