Can Hyperbolic Embeddings Boost Multi‑Step Visual Planning? Introducing GeoWorld
GeoWorld tackles the geometric neglect and multi‑step shortcomings of energy‑based predictive world models by mapping latent representations onto hyperbolic manifolds and applying a geometry‑aware reinforcement learning framework, achieving notable success‑rate gains on long‑horizon visual planning benchmarks.
Introduction
Autoregressive next‑token prediction endows large language models (LLMs) and vision‑language models (VLMs) with extensive world knowledge, yet these models cannot represent physical or geometric properties of the environment. Energy‑based predictive world models address this gap by learning latent energy landscapes that measure compatibility between current and goal states, enabling multi‑step hierarchical planning without generating pixels.
Existing predictive models suffer from two fundamental limitations:
Geometric neglect : latent representations are learned in Euclidean space, which discards geodesic distances and hierarchical embeddings, weakening long‑range geometric consistency.
Multi‑step shortcoming : scarcity of long‑horizon video data forces training on single‑step transitions, causing rapid performance decay as the planning horizon grows.
GeoWorld: A Geometry‑Aware World Model
GeoWorld introduces a geometry‑aware predictive framework that operates on a hyperbolic latent manifold, preserving hierarchical relationships and geodesic structure during multi‑step rollouts.
Hyperbolic Joint‑Embedding Predictive Architecture (H‑JEPA)
H‑JEPA maps latent vectors from Euclidean space \mathbb{R}^n to a hyperbolic manifold \mathbb{H}^n via an exponential map. In hyperbolic space, the geodesic distance naturally encodes hierarchy, allowing the model to learn dynamics along hyperbolic geodesics. The predictor is trained to minimize the hyperbolic energy between successive latent states while regularizing with the hyperbolic triangle inequality, ensuring that the learned energy landscape respects the underlying geometry.
Geometric Reinforcement Learning (GRL)
GRL treats the hyperbolic energy as a value function: lower hyperbolic energy corresponds to higher expected return. Instead of training a separate policy, GRL directly optimizes the predictor by:
Minimizing hyperbolic geodesic distances between predicted and target latent states.
Applying triangle‑inequality regularization to enforce geodesic‑consistent rollouts.
This energy‑based optimization yields stable multi‑step planning without additional reward models.
Experimental Evaluation
GeoWorld was evaluated on two long‑horizon goal‑conditioned visual planning benchmarks:
CrossTask (reference [88])
COIN (reference [71])
Success rate (SR) was measured for 3‑step and 4‑step planning tasks. Compared with the state‑of‑the‑art predictive model V‑JEPA 2, GeoWorld achieved:
≈ 3 % absolute SR improvement on 3‑step tasks.
≈ 2 % absolute SR improvement on 4‑step tasks.
These gains demonstrate that hyperbolic embedding and geometric reinforcement learning improve long‑horizon stability and planning performance.
Contributions
Proposed GeoWorld , a geometry‑aware world model that embeds latent states into a hyperbolic manifold, preserving hierarchical geometry.
Introduced Hyperbolic JEPA (H‑JEPA) , which learns dynamics along hyperbolic geodesics and constructs an energy landscape consistent with the physical world.
Developed Geometric Reinforcement Learning (GRL) , an energy‑based optimization framework that enforces geodesic‑consistent rollouts via hyperbolic energy minimization and triangle‑inequality regularization.
Validated the approach on CrossTask and COIN, achieving consistent performance gains over V‑JEPA 2 in multi‑step visual planning.
Figures
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
