Open-Source XR-1: China’s First Embodied VLA Model for Robots

Beijing Humanoid Robot Innovation Center has open‑sourced XR‑1, the nation’s first VLA (vision‑language‑action) model that meets embodied‑intelligence standards, along with its supporting data sets RoboMIND 2.0 and ArtVIP, detailing its three‑stage training paradigm and cross‑modal capabilities.

21CTO
21CTO
21CTO
Open-Source XR-1: China’s First Embodied VLA Model for Robots

XR‑1 VLA Model Overview

XR‑1 is an open‑source vision‑language‑action (VLA) large model for embodied robotics, the first in China to pass the national embodied‑intelligence standard. It supports multiple robot bodies, scenes and tasks with strong generalization.

机器人 电脑  办公 人工智能
机器人 电脑 办公 人工智能

Technical Foundations

The model is built on three pillars:

Cross‑data‑source learning : leverages a large collection of human‑recorded videos (over one million multi‑body virtual and real samples) to reduce training cost and improve data efficiency.

Cross‑modal alignment : aligns visual perception with motor actions, enabling knowledge‑action integration.

Cross‑body control : abstracts control policies so the same model can be deployed on different robot platforms and brands.

Three‑Stage Training Paradigm

Stage 1 – Dictionary Construction : ingest the multi‑body video corpus and compress complex scenes and motions into a discrete “action code” dictionary for fast retrieval.

Stage 2 – Cross‑body Pre‑training : train on the large‑scale robot dataset to learn basic physical laws (e.g., objects fall when released, doors open when pushed).

Stage 3 – Task‑specific Fine‑tuning : use a small amount of task‑oriented data (e.g., sorting, box moving, clothing folding) to adapt the model to concrete applications.

Data Foundations

XR‑1 is accompanied by two datasets:

RoboMIND 2.0 : a multi‑body robot data collection that provides the video samples used in Stage 1 and Stage 2.

ArtVIP : a high‑quality visual‑action dataset hosted on HuggingFace, used to improve cross‑modal alignment.

Resources

Repository and dataset links:

XR‑1 GitHub: https://github.com/Open-X-Humanoid/XR-1 RoboMIND 2.0: https://modelscope.cn/collections/X-Humanoid/RoboMIND20 ArtVIP dataset:

https://huggingface.co/datasets/x-humanoid-robomind/ArtVIP
open-sourceembodied AIRoboticsArtVIPRoboMINDVLA modelXR-1
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.