Bilibili Tech
Bilibili Tech
Jul 11, 2025 · Artificial Intelligence

IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS

IndexTTS2 introduces a novel auto-regressive zero-shot text-to-speech model that achieves precise duration control and fine-grained emotional expression through a universal time‑encoding mechanism, decoupled voice‑style and emotion modeling, and a GPT‑style latent feature, outperforming state‑of‑the‑art baselines across multiple benchmarks.

duration controlemotional synthesisspeech generation
0 likes · 23 min read
IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS