Bilibili Tech
Jul 11, 2025 · Artificial Intelligence
IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS
IndexTTS2 introduces a novel auto-regressive zero-shot text-to-speech model that achieves precise duration control and fine-grained emotional expression through a universal time‑encoding mechanism, decoupled voice‑style and emotion modeling, and a GPT‑style latent feature, outperforming state‑of‑the‑art baselines across multiple benchmarks.
duration controlemotional synthesisspeech generation
0 likes · 23 min read
