How RoboScience’s Bi-Adapt Framework Tackles Embodied Intelligence Generalization Bottlenecks
RoboScience’s team secured consecutive ICRA best‑paper finalist spots with Bi‑Adapt and D(R,O) Grasp, presenting a few‑shot bimanual adaptation framework and a unified grasp model that together bridge top‑tier research to scalable embodied AI by overcoming cross‑category generalization challenges.
At ICRA 2026 in Vienna, the RoboScience team led by Shao Lin had two papers—Bi‑Adapt and D(R,O) Grasp—selected as best‑paper finalists in the Robot Manipulation and Locomotion track, continuing a streak after the 2025 award for D(R,O) Grasp and highlighting the core bottleneck of generalization in embodied intelligence.
Bi‑Adapt addresses the difficulty of bimanual manipulation, where two hands must coordinate contact points and motion directions. Traditional approaches either hand‑craft actions for each object or rely on massive data and training. Bi‑Adapt treats the two arms as inter‑dependent modules, trains the second hand to cooperate given the first hand’s actions, then trains the first hand to create favorable conditions for the second. It uses a diffusion‑feature (DIFT) visual backbone to find semantic correspondences between known and novel objects, enabling few‑shot trial‑and‑error refinement of contact points. The three‑step pipeline—locate, coordinate, and correct with minimal trials—achieves 59 %–70 % success on five new bimanual tasks in simulation, significantly surpassing baselines such as M‑Where2Act and DualAfford, and succeeds on real‑world tasks like opening, unfolding, and screwing.
The D(R,O) Grasp paper learns a unified point‑cloud representation of both robot hand and object, allowing a single AI model to support multiple grippers (LeapHand, Shadow Lite, XHand, SoftHand). This eliminates the “one‑machine‑one‑policy” limitation and demonstrates cross‑gripper generalization for complex grasping.
Both works are integrated into the VLOA (Vision‑Language‑Object‑Action) architecture, a dual‑engine system. The upper‑level embodied world model predicts object trajectories—position, pose, and shape changes—from multimodal inputs, while the lower‑level universal operation model converts these trajectories into robot actions using large‑scale physics simulation. The object‑trajectory intermediate representation unifies data from internet videos, manuals, and other sources, enabling consistent learning across diverse modalities.
By feeding Bi‑Adapt and D(R,O) Grasp into VLOA, RoboScience demonstrates that any robot can manipulate any object with any end‑effector. This capability was showcased in a large‑scale furniture‑assembly task (“拼家具”) in May 2025, illustrating the transition from academic breakthroughs to scalable embodied AI deployment.
RoboScience, founded in 2024 by Shao Lin (Stanford PhD) and Tian Ye (Stanford AI Lab), builds a full‑stack pipeline that includes the high‑precision RoboMirage physics simulator and has attracted multiple CVC investments, positioning the team to accelerate the industrial rollout of embodied intelligence.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
