May 24, 2026 · Artificial Intelligence

How Hallo‑Live Achieves Real‑Time Streaming Text‑Driven Audio‑Video Avatar Generation

Hallo‑Live introduces an asynchronous dual‑stream diffusion framework combined with human‑centric preference‑guided distillation, enabling text‑driven audio‑video avatars to run at 20.38 FPS with 0.94 s latency—over 16× faster and 99.3 % lower latency than the teacher Ovi model while preserving visual quality and lip‑sync.

Hallo-LiveNVIDIA H200asynchronous dual-stream diffusion

0 likes · 9 min read