Artificial Intelligence 5 min read

Open-Source XR-1: China’s First Embodied VLA Model for Robots

Beijing Humanoid Robot Innovation Center has open‑sourced XR‑1, the nation’s first VLA (vision‑language‑action) model that meets embodied‑intelligence standards, along with its supporting data sets RoboMIND 2.0 and ArtVIP, detailing its three‑stage training paradigm and cross‑modal capabilities.

21CTO

Dec 22, 2025

Open-Source XR-1: China’s First Embodied VLA Model for Robots

XR‑1 VLA Model Overview

XR‑1 is an open‑source vision‑language‑action (VLA) large model for embodied robotics, the first in China to pass the national embodied‑intelligence standard. It supports multiple robot bodies, scenes and tasks with strong generalization.

Technical Foundations

The model is built on three pillars:

Cross‑data‑source learning : leverages a large collection of human‑recorded videos (over one million multi‑body virtual and real samples) to reduce training cost and improve data efficiency.

Cross‑modal alignment : aligns visual perception with motor actions, enabling knowledge‑action integration.

Cross‑body control : abstracts control policies so the same model can be deployed on different robot platforms and brands.

Three‑Stage Training Paradigm

Stage 1 – Dictionary Construction : ingest the multi‑body video corpus and compress complex scenes and motions into a discrete “action code” dictionary for fast retrieval.

Stage 2 – Cross‑body Pre‑training : train on the large‑scale robot dataset to learn basic physical laws (e.g., objects fall when released, doors open when pushed).

Stage 3 – Task‑specific Fine‑tuning : use a small amount of task‑oriented data (e.g., sorting, box moving, clothing folding) to adapt the model to concrete applications.

Data Foundations

XR‑1 is accompanied by two datasets:

RoboMIND 2.0 : a multi‑body robot data collection that provides the video samples used in Stage 1 and Stage 2.

ArtVIP : a high‑quality visual‑action dataset hosted on HuggingFace, used to improve cross‑modal alignment.

Resources

Repository and dataset links:

XR‑1 GitHub: https://github.com/Open-X-Humanoid/XR-1 RoboMIND 2.0: https://modelscope.cn/collections/X-Humanoid/RoboMIND20 ArtVIP dataset:

https://huggingface.co/datasets/x-humanoid-robomind/ArtVIP

open-source embodied AI Robotics ArtVIP RoboMIND VLA model XR-1

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.