LaST‑R1: Embodied Robot Model Hits 99.9% LIBERO Success via Physical Reasoning
LaST‑R1 presents a new embodied AI framework that inserts latent physical reasoning before action generation and jointly optimizes reasoning and control with LAPO, achieving 99.9% average success on the LIBERO benchmark after a single‑trajectory warm‑up and boosting real‑world task success from 52.5% to 93.75%, while showing superior generalization to unseen objects, backgrounds and lighting.
The paper introduces LaST‑R1, a continuation of the LaST₀ base model, as a novel paradigm for post‑training embodied large models. Unlike prior approaches that only optimize the action space, LaST‑R1 adds a latent chain‑of‑thought (latent CoT) stage that first models scene structure, object relations, and future physical dynamics in a latent space before generating actions.
Key components:
Latent Reasoning‑before‑Acting: Given visual observations and language instructions, the model produces latent reasoning embeddings that encode physical intuition, which then guide action token generation.
LAPO (Latent‑to‑Action Policy Optimization): A reinforcement‑learning objective that simultaneously optimizes latent reasoning and action generation. Successful trajectories reinforce both the chosen actions and the preceding latent reasoning, while failures adjust the internal physical understanding.
Adaptive Latent CoT: The model learns a <latent_end> token to decide dynamically how long to reason before acting, allowing simple states to be executed quickly and complex contact‑rich states to receive more deliberation.
The overall training pipeline consists of three stages: (1) latent reasoning before acting, (2) joint optimization of latent and action via LAPO, and (3) adaptive determination of reasoning length.
Experimental results:
Simulation (LIBERO benchmark): With only one warm‑up trajectory, LaST‑R1 reaches 99.9% average success (suite scores 99.8%–100.0%), surpassing strong baselines such as OpenVLA‑OFT, π0.5, SimpleVLA‑RL, and πRL. It converges faster than action‑only PPO.
Real‑world robot tasks: Using 30 warm‑up trajectories, the average success rate improves from 52.5% to 93.75%, far exceeding the 71.25% achieved by the SOTA π0.5 model that uses 100 expert trajectories.
Generalization (OOD tests): When object identity, background, or lighting are altered, LaST‑R1’s performance degrades far less than π0.5, indicating that the model learns transferable physical semantics rather than memorizing specific motion trajectories.
The authors emphasize that LaST‑R1 shifts embodied model training from "see‑and‑act" toward "think‑then‑act," enabling robots to develop a form of physical intuition that supports more robust and adaptable manipulation in diverse real‑world settings.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
