ControlAudio: Script‑Driven, Time‑Precise Text‑to‑Audio Generation Presented at ACL 2026
ControlAudio, a progressive diffusion framework introduced by Tsinghua researchers, unifies text, timing, and phoneme modeling to enable precise control over when sounds occur and what is spoken, achieving superior alignment and intelligibility while preserving high‑fidelity audio generation.
