Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 7, 2026 · Artificial Intelligence

Zero‑Shot Voice Cloning with Emotion and Duration Control: IndexTTS‑2 Runs Locally

IndexTTS‑2, an open‑source zero‑shot TTS system from B‑Station, enables precise duration control, emotion‑tone separation, and bilingual synthesis, offering a modern uv‑based installation, GPU‑accelerated inference, and benchmark‑leading WER and emotional similarity scores compared to contemporary models.

AIIndexTTS-2duration control
0 likes · 10 min read
Zero‑Shot Voice Cloning with Emotion and Duration Control: IndexTTS‑2 Runs Locally
Bilibili Tech
Bilibili Tech
Jul 11, 2025 · Artificial Intelligence

IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS

IndexTTS2 introduces a novel auto-regressive zero-shot text-to-speech model that achieves precise duration control and fine-grained emotional expression through a universal time‑encoding mechanism, decoupled voice‑style and emotion modeling, and a GPT‑style latent feature, outperforming state‑of‑the‑art baselines across multiple benchmarks.

duration controlemotional synthesisspeech generation
0 likes · 23 min read
IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS