Meituan Technology Team
Meituan Technology Team
Apr 23, 2026 · Artificial Intelligence

LARYBench Introduces an ImageNet‑Style Benchmark for Embodied Action Representations Learned from Human Video

LARYBench (Latent Action Representation Yielding Benchmark) provides the first systematic, ImageNet‑scale evaluation for implicit action representations derived from large‑scale human video, decoupling representation quality from downstream control, and shows that general‑purpose vision models outperform specialized embodied models in both action generalization and control precision across diverse robot morphologies and environments.

RoboticsVision-Language-Actionaction representation
0 likes · 13 min read
LARYBench Introduces an ImageNet‑Style Benchmark for Embodied Action Representations Learned from Human Video
Machine Heart
Machine Heart
Apr 18, 2026 · Artificial Intelligence

Eliminating ‘Think‑Then‑Act’ Stalls: StreamingVLA Boosts VLA Speed by 2.4×

StreamingVLA introduces action‑flow matching and adaptive early observation to parallelize generation, execution, and perception in vision‑language‑action models, cutting per‑action latency from 49.9 ms to 31.6 ms, reducing stall time 6.5‑fold, and achieving up to 2.4× end‑to‑end speedup in LIBERO benchmarks and real‑world robot tests.

LIBEROLatencyParallel Execution
0 likes · 13 min read
Eliminating ‘Think‑Then‑Act’ Stalls: StreamingVLA Boosts VLA Speed by 2.4×
Machine Heart
Machine Heart
Apr 11, 2026 · Artificial Intelligence

Why VLA Pioneers Are Abandoning Vision‑Language‑Action Models

Generalist AI’s GEN-1 model achieves over 99% success, 2‑3× speed gains with only a tenth of the data, and its founders argue that vision‑language‑action (VLA) models are merely a crutch, urging a shift toward goal‑driven, fully‑scratch training for physical AGI.

GEN-1Generalist AIGoal-driven research
0 likes · 13 min read
Why VLA Pioneers Are Abandoning Vision‑Language‑Action Models
Machine Heart
Machine Heart
Mar 31, 2026 · Artificial Intelligence

Point‑VLA: Overcoming Embodied AI’s Language Bottleneck with Visual Grounding

The Point‑VLA method introduced by Qianxun AI’s Gaoyang team tackles the fundamental limits of language‑only instruction in vision‑language‑action models by adding visual grounding via bounding‑box cues, boosting real‑robot success rates from 32.4% to 92.5% across six challenging tasks.

Data AnnotationMultimodal LearningPoint-VLA
0 likes · 13 min read
Point‑VLA: Overcoming Embodied AI’s Language Bottleneck with Visual Grounding
HyperAI Super Neural
HyperAI Super Neural
Feb 19, 2026 · Artificial Intelligence

World Model & VLA Breakthroughs: Top Papers from NVIDIA, ByteDance, Tsinghua and Others

This roundup highlights six recent embodied AI papers that advance world models and vision‑language‑action (VLA) techniques, covering DreamDojo's massive first‑person video model, LingBot‑World simulator, Agent World Model generator, BagelVLA, ACoT‑VLA, and the closed‑loop World‑VLA‑Loop framework.

RoboticsSynthetic EnvironmentsVision-Language-Action
0 likes · 8 min read
World Model & VLA Breakthroughs: Top Papers from NVIDIA, ByteDance, Tsinghua and Others
HyperAI Super Neural
HyperAI Super Neural
Dec 12, 2025 · Artificial Intelligence

Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks

This roundup presents five recent AI papers covering hierarchical sparse attention for ultra‑long context, Nvidia's Alpamayo‑R1 VLA model for autonomous driving, the non‑autoregressive F5‑TTS system, LatentMAS for latent‑space multi‑agent collaboration, and Deeper‑GXX that deepens arbitrary graph neural networks, highlighting each method's key innovations and reported performance gains.

Attention MechanismVision-Language-Actionautonomous driving
0 likes · 6 min read
Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks
Data Party THU
Data Party THU
Oct 29, 2025 · Artificial Intelligence

Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?

The paper introduces RoboMonkey, a framework that applies a generate‑and‑verify paradigm and test‑time scaling to Vision‑Language‑Action models, showing that increasing sampling and verification at inference dramatically reduces action error across multiple VLA architectures, and presents scalable verifier training, synthetic data augmentation, and efficient deployment strategies.

AI researchAction VerificationRoboMonkey
0 likes · 8 min read
Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?
Amap Tech
Amap Tech
Oct 6, 2025 · Artificial Intelligence

Breaking VLA Training Limits: World-Env’s Virtual Sandbox for Safe, Data‑Efficient Robotics

World-Env introduces a virtual training sandbox that eliminates physical interaction, dramatically improves data efficiency with just five expert demos per task, and employs a vision‑language model as a semantic judge to dynamically terminate actions, enabling safe, high‑performing VLA post‑training across diverse robotic benchmarks.

Data EfficiencyVision-Language-Actionvirtual environment
0 likes · 9 min read
Breaking VLA Training Limits: World-Env’s Virtual Sandbox for Safe, Data‑Efficient Robotics
AI Cyberspace
AI Cyberspace
Feb 23, 2025 · Artificial Intelligence

How Helix Empowers Humanoid Robots to See, Hear, Understand, and Act

Helix is a groundbreaking Vision‑Language‑Action model that integrates perception, language understanding, and motor control, enabling humanoid robots to perform full upper‑body continuous movements, collaborate across multiple robots, grasp any household object via natural language, and run on low‑power embedded GPUs for commercial use.

Vision-Language-Actionembodied AIgeneralist control
0 likes · 16 min read
How Helix Empowers Humanoid Robots to See, Hear, Understand, and Act