On‑Device TTS Breakthrough: NeuTTS‑Air Achieves 3‑Second Audio Cloning with a 0.5B Model

NeuTTS‑Air, an open‑source on‑device text‑to‑speech model built on a 0.5B Qwen LLM and NeuCodec, reaches SOTA among open models, runs entirely on CPU, supports 3‑second voice cloning, and comes with a step‑by‑step tutorial for deployment on edge devices.

HyperAI Super Neural
HyperAI Super Neural
HyperAI Super Neural
On‑Device TTS Breakthrough: NeuTTS‑Air Achieves 3‑Second Audio Cloning with a 0.5B Model

Traditional high‑quality TTS models demand substantial compute resources and cloud services, incurring high costs and requiring minutes‑long audio for training, which raises deployment barriers and limits use in privacy‑sensitive scenarios.

NeuTTS‑Air introduces a new solution: the first globally‑available on‑device TTS language model that delivers ultra‑realistic synthesis and instant voice cloning. It is built on a 0.5B Qwen large language model combined with the NeuCodec audio codec, showing strong few‑shot learning, and can generalize to embedded agents and style‑transfer tasks while supporting 3‑second audio cloning and natural dialogue generation.

Experimental evaluation shows NeuTTS‑Air achieves state‑of‑the‑art performance among open‑source models, especially on ultra‑realistic synthesis and real‑time inference benchmarks. Post‑training adds GGML/ONNX support and a watermark mechanism, leading the open‑source field in power‑consumption tests and matching closed‑source models in certain scenarios. Crucially, the model can perform inference on a CPU, making it suitable for phones, laptops, and Raspberry Pi devices.

Tutorial steps to run NeuTTS‑Air on the HyperAI platform:

Visit the HyperAI homepage, open the “Tutorial” section, select “NeuTTS‑Air: Lightweight High‑Efficiency Voice Cloning Model”, and click “Run this tutorial”.

On the tutorial page, click the top‑right “Clone” button to copy the tutorial into your own container.

Choose the “NVIDIA GeForce RTX 5090” GPU and a PyTorch image, select a payment plan (Pay‑As‑You‑Go, Daily, Weekly, or Monthly), and click “Continue job execution”.

Wait roughly three minutes for resource allocation; once the status changes to “Running”, click the arrow next to the API address to open the demo page (real‑name verification required).

In the demo, upload a reference audio file, enter the reference text, type the desired output text in “Text to Generate”, and submit to receive the cloned audio.

This workflow demonstrates how NeuTTS‑Air lowers the barrier for deploying high‑quality, low‑latency TTS on edge devices, enabling ultra‑realistic voice synthesis without reliance on cloud‑based large models.

QwenTTSon-device inferenceaudio cloningNeuCodecNeuTTS-Air
HyperAI Super Neural
Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.