OneModel 1.7 Hits 99% LIBERO Success, Bridging ‘Seeing’ to ‘Doing’ with Implicit Predictive Policy
OneModel 1.7 FrontoStria‑RL achieves a 99% average success rate on the LIBERO benchmark, surpassing π0.5, GR00T‑N1.5 and OpenVLA‑OFT, by introducing a Predictive Policy Latent that implicitly links world‑model understanding to action execution and is continuously refined through a reinforcement‑learning loop and a Retrieve‑then‑Steer memory mechanism.
Problem Context
World‑action models (WAM) aim to learn environmental dynamics and corresponding robot actions, but a persistent gap exists: the world model can “understand” changes while the action policy still “does the wrong thing.” This transmission gap hinders reliable embodied intelligence in household settings.
OneModel 1.7 FrontoStria‑RL Performance
On the LIBERO benchmark the model achieves a 99 % average success rate, surpassing π0.5, GR00T‑N1.5 and OpenVLA‑OFT. Real‑world success rates are 99 % on daily tasks, 97 % on high‑precision tasks, and 91.2 % on a human‑vs‑robot table‑tennis scenario. The improvement stems from an implicit transmission pathway—Predictive Policy Latent—combined with a reinforcement‑learning (RL) closed‑loop mechanism.
Two Main Research Routes
VLA (Vision‑Language‑Action) route maps visual observations and language commands directly to robot actions. It works well when training data fully covers the scenario but degrades under object position shifts, viewpoint changes, lighting variations, or multi‑stage tasks because the policy loses the global goal.
World Model route builds predictive representations of environment states and task evolution, offering stronger theoretical generalization. However, the world model’s understanding does not guarantee correct actions; explicit future images or intermediate coordinates introduce pixel redundancy, generation errors, and inference latency, leaving a gap between understanding and execution.
Predictive Policy Latent
Instruction / Observation / Skill → World Model → Predictive Policy Latent → Understand Expert → Action Expert → Robot Execution → RL / Success Memory / HITL ↺
The Predictive Policy Latent (PPL) is an implicit conduit that transforms high‑level world‑model insights into modulation signals for the Action Expert, avoiding explicit image or coordinate generation.
Training phase : the model observes the future outcome after an action, shaping an implicit physical‑reasoning representation.
Deployment phase : the model relies solely on the current observation to produce an equivalent modulation signal, reducing information density, speeding inference, and eliminating generative noise.
Training : model sees future observations to learn good action directions.
Deployment : only current observation is needed to output the latent signal.
RL Closed‑Loop and Retrieve‑then‑Steer
The architecture adds an RL closed‑loop with explicit rewards, safety constraints, and human‑in‑the‑loop (HITL) supervision, allowing the policy to surpass the imitation‑learning ceiling.
Retrieve‑then‑Steer treats successful execution traces as “experience memory.” It stores, retrieves, filters, and guides future actions based on similarity to the current state, enabling continual improvement without retraining.
Store : during deployment, successful observation‑action fragments are saved to a long‑term Success Memory.
Retrieve : at inference, the system fetches fragments relevant to the current state.
Filter : trajectory‑level consistency filters out inconsistent candidates.
Guide : filtered fragments are aggregated into elite priors; Confidence‑Adaptive Prior Guidance injects them into a flow‑matching action sampler, adjusting guidance strength based on retrieval confidence.
On the SimplerEnv benchmark, Retrieve‑then‑Steer raises CogACT’s average success from 75.8 % to 79.5 % (+3.7 %).
Supporting Modules
Understand Expert + Skill receives the PPL signal, decomposes tasks into stages and sub‑goals, and schedules the appropriate skill sequence, providing structured planning for long‑horizon tasks.
MCF‑Proto builds a Motion‑Centric Action Frame around task‑relevant local structures (e.g., door hinges, rails) and learns a set of reusable motion prototypes. Operating in a transformed local coordinate system makes the module tolerant to camera viewpoint changes and robot pose deviations.
In LIBERO‑plus disturbance tests, MCF‑Proto achieves the best results in six of seven categories, leading the baseline by 3.3 % on camera perturbations (69.7 % vs. 66.4 %) and 15.7 % on robot pose perturbations (66.0 % vs. 50.3 %).
Benchmark and Real‑World Validation
Daily operations (washing dishes, stacking clothes) – 99 % average success.
High‑precision tasks (inserting test tubes, pouring coffee beans) – 97 % average success.
Extreme dynamic task – table‑tennis ball return – 91.2 % success.
Key Comparisons
Compared with π0.5, GR00T‑N1.5, π0.7 and DreamZero, OneModel 1.7 is the only model that provides:
An implicit transmission channel (Predictive Policy Latent) that conveys world‑model understanding to the action policy without generating intermediate images or videos.
A reinforcement‑learning closed‑loop that enables continual adaptation after deployment.
References
Paper: https://arxiv.org/abs/2605.11809
Paper: https://arxiv.org/abs/2605.10094
Official site: https://www.onerobot.com/OneModel
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
