Machine Heart
Apr 21, 2026 · Artificial Intelligence
ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation
ControlAudio, a progressive diffusion model presented at ACL 2026, jointly models text, timing, and phoneme information to achieve precise event timing and intelligible speech in text-to-audio generation, backed by a large mixed real‑synthetic dataset and competitive experimental results.
Audio GenerationControlAudioMultimodal Learning
0 likes · 10 min read
