Qwen3‑TTS: 3‑Second Voice Cloning and Fine‑Grained Control with 5M‑Hour Dataset

The article introduces Qwen3‑TTS, a dual‑track multilingual text‑to‑speech model trained on over five million hours of speech, detailing its two tokenizers, 3‑second voice‑cloning capability, SOTA benchmark results, and step‑by‑step instructions for running the demo on HyperAI.

HyperAI Super Neural
HyperAI Super Neural
HyperAI Super Neural
Qwen3‑TTS: 3‑Second Voice Cloning and Fine‑Grained Control with 5M‑Hour Dataset

Qwen3‑TTS is presented as a high‑quality, controllable multilingual TTS model that leverages a dual‑track language‑model architecture to generate speech while allowing fine‑grained control over output characteristics.

The model was trained on more than 5 million hours of speech data covering ten languages and incorporates two distinct speech tokenizers:

Qwen‑TTS‑Tokenizer‑25Hz : a single‑codebook codec focused on semantic content, compatible with Qwen‑Audio and using block‑wise DiT for streaming waveform reconstruction.

Qwen‑TTS‑Tokenizer‑12Hz : a multi‑codebook design (12.5 Hz, 16 layers) with a lightweight causal ConvNet, achieving 97 ms first‑packet latency for ultra‑low‑delay streaming.

Extensive experiments show that the model reaches state‑of‑the‑art performance on multilingual TTS test sets and the InstructTTSEval benchmark, both objectively and subjectively.

The article then provides a practical tutorial for running the Qwen3‑TTS demo on the HyperAI platform. Users are guided to locate the tutorial page, clone the repository, select an NVIDIA GeForce RTX 5090 GPU with a PyTorch image, choose a pricing plan, and launch the Jupyter workspace. After the environment is ready, the README can be executed to generate speech, and the resulting audio can be accessed via the provided API endpoint.

Additional notes mention a promotional offer: new users can obtain 20 hours of RTX 5090 compute for $1, with the resources remaining permanently available.

benchmarkAI modeltutorialmultilingualtext-to-speechvoice cloningQwen3-TTS
HyperAI Super Neural
Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.