Meituan Technology Team
Apr 16, 2026 · Artificial Intelligence
Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT
LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.
AI researchAudio Generationdiffusion model
0 likes · 12 min read
