Artificial Intelligence 5 min read

On‑Device TTS Breakthrough: NeuTTS‑Air Achieves 3‑Second Audio Cloning with a 0.5B Model

NeuTTS‑Air, an open‑source on‑device text‑to‑speech model built on a 0.5B Qwen LLM and NeuCodec, reaches SOTA among open models, runs entirely on CPU, supports 3‑second voice cloning, and comes with a step‑by‑step tutorial for deployment on edge devices.

HyperAI Super Neural

Nov 4, 2025

On‑Device TTS Breakthrough: NeuTTS‑Air Achieves 3‑Second Audio Cloning with a 0.5B Model

Traditional high‑quality TTS models demand substantial compute resources and cloud services, incurring high costs and requiring minutes‑long audio for training, which raises deployment barriers and limits use in privacy‑sensitive scenarios.

NeuTTS‑Air introduces a new solution: the first globally‑available on‑device TTS language model that delivers ultra‑realistic synthesis and instant voice cloning. It is built on a 0.5B Qwen large language model combined with the NeuCodec audio codec, showing strong few‑shot learning, and can generalize to embedded agents and style‑transfer tasks while supporting 3‑second audio cloning and natural dialogue generation.

Experimental evaluation shows NeuTTS‑Air achieves state‑of‑the‑art performance among open‑source models, especially on ultra‑realistic synthesis and real‑time inference benchmarks. Post‑training adds GGML/ONNX support and a watermark mechanism, leading the open‑source field in power‑consumption tests and matching closed‑source models in certain scenarios. Crucially, the model can perform inference on a CPU, making it suitable for phones, laptops, and Raspberry Pi devices.

Tutorial steps to run NeuTTS‑Air on the HyperAI platform:

Visit the HyperAI homepage, open the “Tutorial” section, select “NeuTTS‑Air: Lightweight High‑Efficiency Voice Cloning Model”, and click “Run this tutorial”.

On the tutorial page, click the top‑right “Clone” button to copy the tutorial into your own container.

Choose the “NVIDIA GeForce RTX 5090” GPU and a PyTorch image, select a payment plan (Pay‑As‑You‑Go, Daily, Weekly, or Monthly), and click “Continue job execution”.

Wait roughly three minutes for resource allocation; once the status changes to “Running”, click the arrow next to the API address to open the demo page (real‑name verification required).

In the demo, upload a reference audio file, enter the reference text, type the desired output text in “Text to Generate”, and submit to receive the cloned audio.

This workflow demonstrates how NeuTTS‑Air lowers the barrier for deploying high‑quality, low‑latency TTS on edge devices, enabling ultra‑realistic voice synthesis without reliance on cloud‑based large models.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Qwen TTS on-device inference audio cloning NeuCodec NeuTTS-Air

Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.