Tagged articles

Text-to-Audio

4 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Apr 23, 2026 · Artificial Intelligence

ControlAudio: Script‑Driven, Time‑Precise Text‑to‑Audio Generation Presented at ACL 2026

ControlAudio, a progressive diffusion framework introduced by Tsinghua researchers, unifies text, timing, and phoneme modeling to enable precise control over when sounds occur and what is spoken, achieving superior alignment and intelligibility while preserving high‑fidelity audio generation.

ACL 2026ControlAudioMultimodal Generation

0 likes · 11 min read

ControlAudio: Script‑Driven, Time‑Precise Text‑to‑Audio Generation Presented at ACL 2026

Machine Heart

Apr 21, 2026 · Artificial Intelligence

ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation

ControlAudio, a progressive diffusion model presented at ACL 2026, jointly models text, timing, and phoneme information to achieve precise event timing and intelligible speech in text-to-audio generation, backed by a large mixed real‑synthetic dataset and competitive experimental results.

ControlAudioMultimodal LearningProgressive Diffusion

0 likes · 10 min read

ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation

Tencent Cloud Developer

Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCCLIPDiffusion Models

0 likes · 19 min read

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

Volcano Engine Developer Services

Feb 14, 2023 · Artificial Intelligence

How Make-An-Audio Turns Text Into Realistic Sound Effects

Make-An-Audio, a collaborative text‑to‑audio model from Zhejiang University, Peking University and Volcano Speech, uses a Distill‑then‑Reprogram strategy to generate high‑quality, controllable sound effects from any modality, showcasing impressive demos and promising future AIGC applications.

AIGCDeep LearningSpeech synthesis

0 likes · 7 min read

How Make-An-Audio Turns Text Into Realistic Sound Effects

Text-to-Audio

ControlAudio: Script‑Driven, Time‑Precise Text‑to‑Audio Generation Presented at ACL 2026

ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

How Make-An-Audio Turns Text Into Realistic Sound Effects

ControlAudio: Script‑Driven, Time‑Precise Text‑to‑Audio Generation Presented at ACL 2026