DexJoCo: First High‑Difficulty Benchmark with 11 Dexterous Manipulation Tasks Covering Four Core Abilities

DexJoCo, a new MuJoCo‑based benchmark from the Chinese Academy of Sciences, introduces 11 complex dexterous‑hand tasks spanning tool use, bimanual collaboration, long‑horizon execution, and reasoning, and reveals that even state‑of‑the‑art robot learning models still struggle with reliable fine‑grained manipulation.

Machine Heart
Machine Heart
Machine Heart
DexJoCo: First High‑Difficulty Benchmark with 11 Dexterous Manipulation Tasks Covering Four Core Abilities

Recent advances in robot foundation models and dexterous‑hand hardware have shifted robotic manipulation from simple grasping toward complex functional interactions, raising the question of how to systematically evaluate true dexterous capabilities. Existing benchmarks focus on arm‑gripper pick‑and‑place tasks and cannot assess tool use, bimanual coordination, long‑range execution, or fine‑grained interaction.

To address this gap, the Institute of Automation of the Chinese Academy of Sciences introduced DexJoCo, a MuJoCo‑based benchmark and toolkit for task‑oriented dexterous manipulation. DexJoCo defines 11 functional tasks that cover four core ability dimensions:

Tool use – e.g., watering a plant, hammering a nail, storing glasses, operating a mouse.

Bimanual collaboration – e.g., assembling with two hands, unlocking a tablet, taking a photo.

Long‑horizon execution – e.g., opening a microwave, placing food, closing the door and starting it.

Reasoning – e.g., solving a Tower of Hanoi step or entering a password based on a language instruction.

Unlike traditional pick‑and‑place benchmarks, DexJoCo emphasizes functional interaction, finger‑level control, task‑sequence understanding, and two‑hand coordination, enabling researchers to probe the limits of dexterous hands in realistic scenarios.

The benchmark provides a complete workflow: task construction → human tele‑operation → trajectory collection → data format conversion → model training → policy evaluation. Human demonstrations are captured with Rokoko Smartgloves for finger motion, HTC Vive Tracker and Base Station for wrist tracking, and a remapping module that transfers human hand motions to an Allegro Hand. The hardware setup costs roughly $2,300, lowering the barrier for collecting high‑quality dexterous data.

Collected data (≈1.1 K human tele‑operation trajectories) can be exported to common formats such as LeRobot and Diffusion‑Policy Zarr, allowing direct training and evaluation of models like ACT, Diffusion Policy, π₀.5, and GR00T‑N1.5.

Evaluation on DexJoCo shows that even the most advanced robot learning strategies still face significant challenges. Experiments reveal a drop in success rates when visual conditions (camera angle, lighting, table texture) change, and frequent failures in bimanual, insertion, and button‑pressing tasks. Models often succeed at initial grasping but become unstable during fine interaction steps such as precise button pressing, accurate hole insertion, or sustained tool handling, and they may lose the task sequence in long‑horizon scenarios.

These results indicate a substantial gap between current robot policies and stable, reliable human‑level dexterous manipulation, highlighting the need for better unified modeling of vision, language, touch, and high‑dimensional hand actions.

DexJoCo’s broader goal is not to produce a leaderboard but to offer a standardized, reproducible, and extensible platform that helps the community answer key questions: where do dexterous hands truly outperform simple grippers, can current VLA models handle high‑dimensional hand spaces, what data‑capture methods best support dexterous tasks, and how should task design drive progress toward human‑level robot operation.

DexJoCo overview
DexJoCo overview
DexJoCo workflow diagram
DexJoCo workflow diagram
Performance of modern robot policies on DexJoCo
Performance of modern robot policies on DexJoCo
Failure summary for π₀.5 model
Failure summary for π₀.5 model
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MuJoCodexterous manipulationrobot learningrobotics benchmarkACTDiffusion Policyhuman teleoperation
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.