Machine Heart
Apr 20, 2026 · Artificial Intelligence
AURA: Real-Time Video Understanding Shifts from Post-Play Q&A to Continuous Interaction
AURA introduces an always‑on video LLM that processes streams frame‑by‑frame, decides when to stay silent or answer, uses a dual sliding‑window context and a Silent‑Speech Balanced Loss, achieves state‑of‑the‑art scores on StreamingBench, OVO‑Bench and OmniMMI, and runs at 2 FPS with ~312 ms end‑to‑end latency on two 80G GPUs.
AURAReal-Time InteractionSilent-Speech Loss
0 likes · 15 min read
