Aug 10, 2022 · Artificial Intelligence

Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)

The MSMC‑TTS system, a multi‑stage multi‑codebook VQ‑VAE based neural text‑to‑speech solution, delivers near‑human audio quality (MOS 4.41) with a compact 3.12 MB acoustic model, substantially surpassing Mel‑Spectrogram FastSpeech baselines in naturalness and efficiency.

Compact RepresentationMulti-Stage ModelingSpeech synthesis

0 likes · 10 min read

Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)

Zuoyebang Tech Team

May 19, 2022 · Artificial Intelligence

How to Achieve High‑Quality TTS with Only Minutes of Data

This article reviews neural speech synthesis, explains why large high‑quality paired data are essential, and presents a range of low‑resource solutions—including semi‑supervised pre‑training, cross‑language transfer, speaker embedding, and Conformer‑based model upgrades—demonstrating how the Zuoyebang team built a robust TTS system with as little as 7‑minute speaker recordings.

ConformerFastspeech2Speech synthesis

0 likes · 15 min read

How to Achieve High‑Quality TTS with Only Minutes of Data

neural TTS

Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)

How to Achieve High‑Quality TTS with Only Minutes of Data