Meituan Technology Team
Meituan Technology Team
Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchAudio Generationdiffusion model
0 likes · 12 min read
Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT