How Xiaomi’s Tactile‑Enabled Robot Graduates from Lab to Automotive Assembly Line

The article details Xiaomi Robotics' transition of its VLA‑based robot with TacRefineNet tactile perception from laboratory experiments to a real automotive factory, achieving a 90.2% dual‑side success rate over three hours while meeting a 76‑second production cycle, and explains the end‑to‑end data‑driven control, multimodal sensing, whole‑body motion strategy, failure cases, and open resources.

Xiaomi Tech
Xiaomi Tech
Xiaomi Tech
How Xiaomi’s Tactile‑Enabled Robot Graduates from Lab to Automotive Assembly Line

Previously we introduced the tactile‑refinement model TacRefineNet and the Vision‑Language‑Action (VLA) large model Xiaomi‑Robotics‑0, which together give robots a knowledgeable brain, agile body, and the missing piece of tactile perception.

Moving from the lab to an automotive factory reveals a huge reality gap: production cadence and pass rate. In the lab thousands of failure iterations are acceptable, but the factory demands precise, reliable actions timed to the second. The robot must evolve from an "apprentice" to a "full‑time worker".

In a real factory self‑tapping‑nut pick‑and‑place station the robot ran autonomously for three hours, achieving a dual‑side installation success rate of 90.2% and satisfying the fastest 76‑second line cadence.

The task requires the robot to repeatedly grasp self‑tapping nuts from an automatic feeder, place them on a positioning fixture, and cooperate with a slide table and automatic locking of the work station to automate the fastening of floor‑panel nuts after integrated casting.

Key technical highlights:

General VLA base model: Unified action‑space design and cross‑embodiment pre‑training dramatically improve generalization in task understanding, spatial perception, and motion execution.

VLA + RL joint training framework: Introducing reinforcement learning reduces dependence on real‑robot tele‑operation data and enhances cross‑embodiment and cross‑environment generalization.

Tactile information fusion: Incorporating tactile feedback in dense‑contact factory scenarios significantly boosts stability and robustness.

To support planning and provide online RL rewards, we fuse vision, tactile, and joint proprioception for collaborative perception, markedly lowering state‑misrecognition under complex conditions. For example, relying solely on vision leads to uncertainty under lighting changes or occlusion, while tactile alone can be disturbed by unintended contacts.

Self‑tapping nut and positioning pin structure
Self‑tapping nut and positioning pin structure

Whole‑body motion control: A hybrid architecture combines optimization control and reinforcement learning. The optimizer uses quadratic programming with null‑space projection to enforce four strict priority levels—balance, safety, task, and other metrics—solving each instance in under 1 ms. The RL controller is trained on a large‑scale parallel simulation platform, exposing billions of random disturbances and failure scenarios to teach the robot balance‑preserving strategies that transfer zero‑shot to the real robot.

Model framework and training pipeline
Model framework and training pipeline

Typical failure cases: Misalignment of the nut’s spline with the pin leads to jamming and incomplete installation; variable grasp pose and magnetic pull further increase difficulty, resulting in poor spline‑pin fit or missed placement.

Loose spline‑pin fit
Loose spline‑pin fit

Beyond the self‑tapping‑nut station, other pilot stations (e.g., material box handling, badge installation) are being deployed. Scaling to broader industrial use still requires breakthroughs in production cadence, pass rate, whole‑body coordination, and dexterous hand efficiency.

All earlier technical details, experiment videos, and code are openly available:

Project page: https://sites.google.com/view/hil-daft/

arXiv paper: https://arxiv.org/abs/2509.13774

GitHub repository: https://github.com/XiaomiRobotics/Xiaomi-Robotics-0

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

reinforcement learningVLA modelindustrial roboticsmultimodal controlXiaomi RoboticsTacRefineNettactile perception
Xiaomi Tech
Written by

Xiaomi Tech

Chat about technology with Xiaomi and change life together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.