Artificial Intelligence 16 min read

HyVLA-0.5: Sub‑millimeter UMI Data and Real‑Robot Reinforcement Eliminate Heavy Tele‑operation

HyVLA-0.5, an open‑source embodied VLA model from Tencent Robotics X, leverages over 10,000 hours of sub‑millimeter UMI demonstration data and a novel FlowPRO reinforcement pipeline to achieve more than 90% success on simulated and real‑world tasks, while supporting cross‑embodiment transfer and asynchronous deployment.

Machine Heart

Jun 15, 2026

HyVLA-0.5: Sub‑millimeter UMI Data and Real‑Robot Reinforcement Eliminate Heavy Tele‑operation

Overview

On June 15, Tencent Robotics X, together with Futian Lab and the Hunyuan team, released Hy‑Embodied‑0.5‑VLA (HyVLA‑0.5), an end‑to‑end embodied intelligence model for real‑world robot manipulation. The model is built on a self‑developed sub‑millimeter finger‑ring UMI data‑capture system (patent 2025020117CN) that has collected more than 10,000 hours of human‑demonstration data covering 70 task categories and over one million episodes.

High‑Precision UMI Data Collection

The UMI workstation records first‑person visual streams and, via an external optical motion‑capture system, provides 6‑DoF trajectory annotations with sub‑millimeter accuracy. Some end‑effectors also embed force/torque sensors, so the dataset naturally contains physical interaction signals useful for force‑aware learning.

From this system, the Hy‑UMI‑10K dataset was constructed, comprising >10,000 hours of data. The team plans to open 2,000 hours of the raw UMI recordings for community co‑development.

Model Architecture

HyVLA‑0.5 extends the Hy‑Embodied‑0.5 visual‑language model with a flow‑matching action expert module that directly generates continuous motion trajectories. A dual‑tower design decouples visual‑language understanding from action generation, enabling joint semantic perception, spatial reasoning, and low‑level control.

A compact memory encoder compresses multi‑frame, multi‑view visual histories into a concise current‑frame representation, adding short‑term memory without inflating token counts. Incremental action‑block representations predict motion increments relative to the current end‑effector state, reducing dependence on specific robot kinematics and facilitating cross‑embodiment transfer.

Training Pipeline

Training proceeds in two stages. First, continuous pre‑training on the Hy‑UMI‑10K dataset learns general action priors from large‑scale human demonstrations. Second, supervised fine‑tuning follows two complementary tracks:

Track‑A : same‑embodiment adaptation—collect demonstrations on the target robot and fine‑tune for that platform.

Track‑B : UMI‑only cross‑embodiment transfer—fine‑tune solely with UMI demonstrations, then deploy on a robot with a different morphology.

This design validates both precise platform adaptation (Track‑A) and the ability of high‑fidelity UMI data to bridge morphology gaps (Track‑B).

FlowPRO Reinforcement Learning

For post‑training, the team introduced FlowPRO, which systematically applies Proximalized Preference Optimization (PRO) to flow‑matching VLA reinforcement. Unlike methods that rely on handcrafted rewards, FlowPRO captures paired failure and corrective trajectories via real‑robot interventions and roll‑backs, converting them into offline preference signals.

The core RPRO loss directly compares preferred versus non‑preferred actions within the flow‑matching objective, while a proximal regularizer curbs reward drift, mitigating reward‑hacking and catastrophic forgetting. Experiments on four dual‑arm tasks (Bottle, Cap, USB, Zip) show FlowPRO consistently outperforms DAgger and a PI0.6* baseline, pushing success rates toward ceiling levels after three training rounds.

Real‑Robot Deployment

HyVLA‑0.5 implements a cross‑robot shape mapping that translates model‑output incremental actions into the coordinate frames and inverse‑kinematics solutions of diverse platforms (fixed‑base dual‑arm, humanoid, etc.). Asynchronous inference decouples high‑capacity VLA forward passes from servo loops, using an action‑command buffer to hide inference latency. To address discontinuities at action‑block boundaries, a delay‑aware cubic Bézier trajectory stitching method yields smooth, high‑frequency execution without extra controller training.

Benchmark Evaluation

On the RoboTwin 2.0 simulation benchmark, HyVLA‑0.5 achieves 90.9% (Clean) and 90.1% (Randomized) success rates, surpassing contemporary VLA systems. Real‑robot tests span Dobot X‑Trainer, JAKA K1, Astribot S1, and Unitree G1, confirming same‑embodiment adaptation, cross‑embodiment migration, force‑sensing tasks, and FlowPRO post‑training capabilities.

Broader Impact

Beyond the model itself, the release demonstrates a full‑stack pathway from high‑quality data acquisition to deployment: UMI‑driven action priors, embodied architecture linking perception to control, dual‑track fine‑tuning for transfer, FlowPRO‑driven failure‑to‑success learning, and asynchronous execution for stable closed‑loop control. This aligns with Tencent Robotics X’s strategy of open‑source, modular platforms (Tairos, HY‑Embodied series, RoboFusion) to lower entry barriers for robot manufacturers and application developers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

robotics Reinforcement Learning cross-embodiment VLA FlowPRO HyVLA-0.5 UMI data

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.