Tagged articles
2 articles
Page 1 of 1
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 7, 2026 · Artificial Intelligence

Zero‑Shot Voice Cloning with Emotion and Duration Control: IndexTTS‑2 Runs Locally

IndexTTS‑2, an open‑source zero‑shot TTS system from B‑Station, enables precise duration control, emotion‑tone separation, and bilingual synthesis, offering a modern uv‑based installation, GPU‑accelerated inference, and benchmark‑leading WER and emotional similarity scores compared to contemporary models.

AIIndexTTS-2Speech synthesis
0 likes · 10 min read
Zero‑Shot Voice Cloning with Emotion and Duration Control: IndexTTS‑2 Runs Locally
Tencent Cloud Developer
Tencent Cloud Developer
Jun 14, 2024 · Artificial Intelligence

GPT-4o Speech Multimodal Technology: Speech Tokenization, LLM Integration, and Zero-shot TTS

GPT‑4o’s speech multimodal system discretizes audio into semantic and acoustic tokens, integrates these tokens with large language models through multi‑stage instruction tuning, and employs hierarchical zero‑shot text‑to‑speech decoding, enabling low‑latency, streaming, and prompt‑driven voice synthesis for applications like gaming.

AudioLMGPT-4oLLM integration
0 likes · 33 min read
GPT-4o Speech Multimodal Technology: Speech Tokenization, LLM Integration, and Zero-shot TTS