LARYBench Introduces an ImageNet‑Style Benchmark for Embodied Action Representations Learned from Human Video
LARYBench (Latent Action Representation Yielding Benchmark) provides the first systematic, ImageNet‑scale evaluation for implicit action representations derived from large‑scale human video, decoupling representation quality from downstream control, and shows that general‑purpose vision models outperform specialized embodied models in both action generalization and control precision across diverse robot morphologies and environments.
