OmniVoice: A Zero‑Shot TTS Paradigm Covering 600+ Languages
OmniVoice introduces a single‑stage, diffusion‑style language model that maps text directly to multi‑codebook acoustic tokens, achieving zero‑shot voice cloning for over 600 languages with high intelligibility and real‑time factor as low as 0.025, making it suitable for large‑scale multilingual deployment.
