WorldCache Boosts Video World Model Inference Up to 3.7× with Near‑Lossless Quality

WorldCache separates cacheable and recomputable tokens in diffusion world models using curvature‑based classification and a chaotic‑prioritized adaptive skipping schedule, achieving up to 3.7× speedup on HunyuanVoyager‑13B and Aether‑5B without extra memory or retraining while preserving visual quality.

Machine Heart
Machine Heart
Machine Heart
WorldCache Boosts Video World Model Inference Up to 3.7× with Near‑Lossless Quality

Diffusion world models are hard to accelerate because they generate multimodal outputs (RGB, depth, camera trajectory) and tokens evolve at heterogeneous rates; treating all tokens and timesteps uniformly either wastes computation on easy tokens or accumulates error on difficult ones.

WorldCache addresses this by first estimating each token’s trajectory curvature from the three most recent full forward passes, converting speed and acceleration into a curvature score. Tokens are then grouped into Stable (low curvature), Linear (moderate curvature), and Chaotic (high curvature), each receiving a different caching rule: direct reuse, linear extrapolation, or Hermite‑weighted damped update respectively.

The second component, Chaotic‑prioritized Adaptive Skipping , monitors only the Chaotic tokens. By normalising curvature‑based feature differences into a dimensionless drift metric, the system triggers a full recomputation precisely when a critical token begins to diverge, avoiding unnecessary full passes on stable periods.

Experiments on the image‑to‑world task of HunyuanVoyager‑13B show end‑to‑end latency dropping from 1053.7 s to 288.6 s (3.65× faster) while Dynamic WorldScore remains 45.43 (baseline 46.40), PSNR 23.49 and LPIPS 0.176; memory usage stays at 50.58 GB versus 50.44 GB baseline. On Aether‑5B, latency falls from 180.5 s to 107.2 s (1.68×) with Dynamic WorldScore 44.72, PSNR 31.87, SSIM 0.924, LPIPS 0.066, and memory at 46.59 GB. In a 3D reconstruction setting, latency reduces from 55.42 s to 21.20 s (2.61×) while preserving Abs Rel 0.341, RPE trans 0.068 and achieving the lowest rotation error of 0.796.

Thus WorldCache demonstrates that respecting the intrinsic multimodal coupling, spatial variance, and non‑uniform temporal dynamics of world models enables substantial inference acceleration without additional training or memory overhead, opening a path toward more interactive and longer‑horizon simulation applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI accelerationtoken cachingadaptive skippingdiffusion world modelsheterogeneous cachingmultimodal inferenceWorldCache
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.