Tagged articles

Text‑to‑Speech

55 articles · Page 1 of 1

Jun 18, 2026 · Artificial Intelligence

8 Must‑Watch Open‑Source TTS Projects for 2026

This article reviews eight open‑source text‑to‑speech systems—from lightweight, CPU‑only models to multilingual, podcast‑focused engines—detailing their architectures, language coverage, benchmark scores, licensing, and practical use‑case recommendations.

AISpeech synthesisText‑to‑Speech

0 likes · 15 min read

8 Must‑Watch Open‑Source TTS Projects for 2026

Geek Labs

Jun 16, 2026 · Industry Insights

Top Open-Source AI Tools for Watermark Removal, PPT Creation, and TTS

This article reviews five trending GitHub projects that enhance AI‑generated content quality, covering automatic watermark removal, AI‑assisted PPT generation, high‑fidelity text‑to‑speech synthesis, text humanization, and techniques for preserving authorial voice after AI editing.

AIGitHub TrendingPPT generation

0 likes · 8 min read

Top Open-Source AI Tools for Watermark Removal, PPT Creation, and TTS

Weekly Large Model Application

Jun 10, 2026 · Artificial Intelligence

OmniVoice Studio: An Open-Source Alternative to ElevenLabs

OmniVoice Studio packages the OmniVoice TTS/ASR engine into a local desktop application—offering zero-shot voice cloning, voice design, cinematic dubbing, real-time dictation, and multi‑engine support—while keeping data on‑device, providing a privacy‑focused, cost‑free alternative to ElevenLabs with 600+ languages and extensible architecture.

Automatic Speech RecognitionDesktop ApplicationElevenLabs

0 likes · 9 min read

OmniVoice Studio: An Open-Source Alternative to ElevenLabs

Geek Labs

May 3, 2026 · Artificial Intelligence

VibeVoice: Microsoft’s Open‑Source Cutting‑Edge Speech AI Models

The article introduces Microsoft’s open‑source VibeVoice project, detailing its long‑audio ASR‑7B and real‑time TTS‑0.5B models, the continuous speech tokenizer and next‑token diffusion techniques, and provides quick‑start instructions for online demos and local deployment via Hugging Face.

Hugging FaceMicrosoftText‑to‑Speech

0 likes · 3 min read

VibeVoice: Microsoft’s Open‑Source Cutting‑Edge Speech AI Models

James' Growth Diary

May 2, 2026 · Artificial Intelligence

How to Add Real‑Time Speech Recognition and Streaming TTS to Your AI Agent

This guide walks through choosing the right voice‑agent architecture, implementing streaming ASR with WebSocket, triggering sentence‑by‑sentence TTS, wiring the three layers together via async generators, optimizing latency to under a second, and avoiding common pitfalls such as missing VAD and checkpoint persistence.

LangChainText‑to‑SpeechWebSocket

0 likes · 19 min read

How to Add Real‑Time Speech Recognition and Streaming TTS to Your AI Agent

IT Services Circle

Apr 21, 2026 · Artificial Intelligence

Top 10 Open‑Source AI Projects Transforming Multi‑Agent Development, Coding and More

This article surveys ten notable open‑source AI projects—from a visual multi‑agent IDE and a teammate‑style agent framework to AI‑enhanced coding workflows, a lifelong‑memory layer for Claude Code, a massive Chinese textbook repository, a universal Markdown converter, and a high‑quality TTS model—detailing their motivations, core features, benchmarks, and real‑world usage scenarios.

AI toolsLLM workflowsMarkdown conversion

0 likes · 14 min read

Top 10 Open‑Source AI Projects Transforming Multi‑Agent Development, Coding and More

Old Zhang's AI Learning

Apr 17, 2026 · Artificial Intelligence

Google Strikes Back: Gemini’s New Features Take on Claude Code

The article reviews Google Gemini’s three‑pronged rollout— a Mac desktop app with global shortcuts and window‑sharing, a Gemini CLI enhanced with Subagents that keep context clean and enable parallel expert tasks, and the Gemini 3.1 Flash TTS model with Audio Tags—comparing each to competitors and highlighting practical use cases and limitations.

AI codingGemini CLIGoogle Gemini

0 likes · 12 min read

Google Strikes Back: Gemini’s New Features Take on Claude Code

Meituan Technology Team

Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchText‑to‑Speechaudio generation

0 likes · 12 min read

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

SuanNi

Apr 11, 2026 · Artificial Intelligence

Deploy Microsoft VibeVoice TTS for Real‑Time Multi‑Speaker Audio

This guide explains the features of Microsoft’s VibeVoice TTS models, including long‑context synthesis, low‑latency realtime streaming, multi‑speaker support, and provides step‑by‑step instructions for deploying the models on a GPU cloud platform using Python.

AI modelsMulti-speakerRealtime TTS

0 likes · 5 min read

Deploy Microsoft VibeVoice TTS for Real‑Time Multi‑Speaker Audio

AI Open-Source Efficiency Guide

Apr 6, 2026 · Artificial Intelligence

VibeVoice vs PersonaPlex vs OmniVoice: A Comprehensive Open‑Source AI Voice Comparison

This article provides a detailed side‑by‑side analysis of three open‑source speech AI projects—Microsoft's VibeVoice, NVIDIA's PersonaPlex, and Xiaomi's OmniVoice—covering their positioning, core models, technical highlights, multilingual support, performance metrics, licensing, and recommended use cases.

AIAutomatic Speech RecognitionSpeech synthesis

0 likes · 15 min read

VibeVoice vs PersonaPlex vs OmniVoice: A Comprehensive Open‑Source AI Voice Comparison

Open Source Tech Hub

Mar 7, 2026 · Artificial Intelligence

Building a Hands‑Free Voice Assistant with Neuron AI’s Multimodal Audio Providers

This guide explains how to use Neuron v3’s multimodal audio capabilities—including OpenAI and ElevenLabs text‑to‑speech and speech‑to‑text providers—to create a local, hands‑free voice assistant that captures audio, transcribes it, processes it via an agent, and plays back responses.

AgentElevenLabsMultimodal

0 likes · 5 min read

Building a Hands‑Free Voice Assistant with Neuron AI’s Multimodal Audio Providers

HyperAI Super Neural

Mar 3, 2026 · Artificial Intelligence

Qwen3‑TTS: 3‑Second Voice Cloning and Fine‑Grained Control with 5M‑Hour Dataset

The article introduces Qwen3‑TTS, a dual‑track multilingual text‑to‑speech model trained on over five million hours of speech, detailing its two tokenizers, 3‑second voice‑cloning capability, SOTA benchmark results, and step‑by‑step instructions for running the demo on HyperAI.

AI modelQwen3-TTSText‑to‑Speech

0 likes · 4 min read

Qwen3‑TTS: 3‑Second Voice Cloning and Fine‑Grained Control with 5M‑Hour Dataset

PaperAgent

Jan 25, 2026 · Industry Insights

Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links

This roundup highlights ten cutting‑edge Chinese AI models—including Qwen3‑TTS, LongCat‑Flash‑Thinking‑2601, GLM‑4.7‑Flash, STEP3‑VL‑10B, Baichuan‑M3, and Youtu‑LLM—detailing their multilingual capabilities, architecture innovations, performance claims, and providing direct repository links for researchers and developers.

AI researchChinese AIMultimodal

0 likes · 7 min read

Top 10 Chinese Large Models to Watch: Features, Benchmarks, and Download Links

Ubuntu

Jan 25, 2026 · Artificial Intelligence

Deploy Alibaba Qwen3‑TTS on Ubuntu: 3‑Second Voice Cloning with 97 ms Latency

This guide walks through installing and running Alibaba's open‑source Qwen3‑TTS on Ubuntu, covering environment setup, GPU requirements, model selection, Python virtual‑environment creation, code examples for voice cloning and voice design, low‑latency streaming, Web UI launch, and common troubleshooting tips.

AIPythonQwen3-TTS

0 likes · 9 min read

Deploy Alibaba Qwen3‑TTS on Ubuntu: 3‑Second Voice Cloning with 97 ms Latency

Old Zhang's AI Learning

Jan 24, 2026 · Artificial Intelligence

Open-Source Qwen3‑TTS: Sub‑100 ms Latency, Runs on 8 GB GPU, and ComfyUI Integration

Qwen3‑TTS, an open‑source text‑to‑speech model from Alibaba, offers sub‑100 ms first‑packet latency, supports voice cloning, natural‑language voice design, and ten languages, can be deployed locally on a GPU with as little as 8 GB VRAM, and integrates with ComfyUI for visual workflow building.

ComfyUIQwen3-TTSText‑to‑Speech

0 likes · 15 min read

Open-Source Qwen3‑TTS: Sub‑100 ms Latency, Runs on 8 GB GPU, and ComfyUI Integration

Old Meng AI Explorer

Jan 8, 2026 · Artificial Intelligence

How Microsoft’s Open‑Source VibeVoice Gives AI Speech Real Emotion

Microsoft’s open‑source VibeVoice model transforms text‑to‑speech by adding fine‑grained emotional control, multi‑scene styles, and support for over 100 languages, offering free commercial use, low‑latency local deployment, and detailed parameter settings that let developers and creators generate expressive, context‑aware audio for videos, audiobooks, chatbots, and more.

AI voiceText‑to‑SpeechVibeVoice

0 likes · 10 min read

How Microsoft’s Open‑Source VibeVoice Gives AI Speech Real Emotion

HyperAI Super Neural

Jan 3, 2026 · Artificial Intelligence

Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS

Resemble AI’s open‑source Chatterbox‑Turbo reduces TTS generation from ten steps to one, enabling high‑sample‑rate, lossless voice cloning from a 5‑10 second reference while supporting emotional control, side‑language tags, and embedded watermarking for real‑time applications across chatbots, games, podcasts, and education.

Chatterbox‑TurboReal-time inferenceText‑to‑Speech

0 likes · 7 min read

Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS

HyperAI Super Neural

Dec 12, 2025 · Artificial Intelligence

Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks

This roundup presents five recent AI papers covering hierarchical sparse attention for ultra‑long context, Nvidia's Alpamayo‑R1 VLA model for autonomous driving, the non‑autoregressive F5‑TTS system, LatentMAS for latent‑space multi‑agent collaboration, and Deeper‑GXX that deepens arbitrary graph neural networks, highlighting each method's key innovations and reported performance gains.

Attention MechanismGraph Neural NetworksMulti-Agent Systems

0 likes · 6 min read

Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks

360 Smart Cloud

Nov 26, 2025 · Artificial Intelligence

How to Integrate a Multi‑Engine TTS Fusion Service for Stable High‑Quality Speech

This guide explains the challenges of using disparate TTS providers, introduces a unified multi‑engine speech synthesis service, details its technical highlights, typical use cases, and provides complete API specifications with request/response examples and authentication steps.

APICloudText‑to‑Speech

0 likes · 8 min read

How to Integrate a Multi‑Engine TTS Fusion Service for Stable High‑Quality Speech

Programmer DD

Oct 19, 2025 · Backend Development

How to Add Free Edge TTS to Your Spring Boot Application in Minutes

This tutorial shows how to integrate UnifiedTTS's free Edge TTS service into a Spring Boot project, covering project setup, API key registration, configuration, request/response models, service implementation, unit testing, and runtime verification with sample code and images.

API integrationEdge TTSJava

0 likes · 9 min read

How to Add Free Edge TTS to Your Spring Boot Application in Minutes

HyperAI Super Neural

Oct 8, 2025 · Artificial Intelligence

From WeChat’s AI Podcast Trial to Google, ByteDance and Xiaohongshu: Can AI Podcasts Capture the Emerging AIGC Blue Ocean?

The article examines how breakthroughs in large language models and high‑fidelity TTS are powering AI‑generated podcasts, analyzes the technical advances behind the "human‑like" sound, surveys major players such as Google, ByteDance, Xiaohongshu and startups, and evaluates the market potential of this rapidly expanding AIGC niche.

AI podcastAIGCByteDance

0 likes · 9 min read

From WeChat’s AI Podcast Trial to Google, ByteDance and Xiaohongshu: Can AI Podcasts Capture the Emerging AIGC Blue Ocean?

Python Programming Learning Circle

Oct 7, 2025 · Artificial Intelligence

Build a Voice-Enabled Chatbot in Python Using Baidu AI and Qingyunke

This tutorial walks through creating a Python program that captures spoken input, converts it to text with Baidu AI, sends the text to the free Qingyunke chatbot API for a response, and finally synthesizes the reply back into speech, complete with code snippets and setup instructions.

Baidu AIChatbotText‑to‑Speech

0 likes · 9 min read

Build a Voice-Enabled Chatbot in Python Using Baidu AI and Qingyunke

Xiaohongshu Tech REDtech

Sep 19, 2025 · Artificial Intelligence

FireRedTTS-2: How the New Open-Source Model Achieves Human‑Like Multi‑Speaker Dialogue Synthesis

FireRedTTS-2, the latest open‑source dialogue TTS model from Xiaohongshu’s audio team, upgrades its speech tokenizer and text‑to‑speech architecture to enable low‑latency, per‑sentence generation, robust multi‑speaker switching, and natural prosody across multiple languages, outperforming rivals in both objective and subjective tests.

AI audioText‑to‑Speechdialogue synthesis

0 likes · 10 min read

FireRedTTS-2: How the New Open-Source Model Achieves Human‑Like Multi‑Speaker Dialogue Synthesis

ShiZhen AI

Sep 11, 2025 · Artificial Intelligence

Exploring IndexTTS2.0: China’s Leading Open‑Source TTS with Precise Duration Control

IndexTTS2.0, a new Chinese open‑source autoregressive TTS model, introduces accurate duration control, four emotion‑control methods, and high‑quality Chinese synthesis, offering code examples, demos, and a step‑by‑step usage guide that eliminates manual video‑dubbing adjustments.

IndexTTSPythonText‑to‑Speech

0 likes · 6 min read

Exploring IndexTTS2.0: China’s Leading Open‑Source TTS with Precise Duration Control

DataFunSummit

Sep 7, 2025 · Artificial Intelligence

How NIO Cut Radio Production Costs by 80% with AI Voice Cloning

This article details NIO's AI‑driven voice‑cloning solution for its in‑car NIO Radio, explaining the business background, pain points of traditional production, the TTS‑VC framework and modular workflow, evaluation metrics, and the resulting cost savings, efficiency gains, and scalability across dozens of cities.

AIAutomotiveSpeech synthesis

0 likes · 10 min read

How NIO Cut Radio Production Costs by 80% with AI Voice Cloning

Python Programming Learning Circle

Aug 22, 2025 · Artificial Intelligence

Build a Powerful Python Voice Assistant with GPT‑4: Step‑by‑Step Guide

This tutorial walks you through creating a Python voice assistant powered by GPT‑4, covering project setup, virtual environment creation, required package installation, core code for speech recognition, text‑to‑speech, command handling, and optional speech‑rate adjustment.

GPT-4Text‑to‑SpeechVoice Assistant

0 likes · 17 min read

Build a Powerful Python Voice Assistant with GPT‑4: Step‑by‑Step Guide

Bilibili Tech

Jul 11, 2025 · Artificial Intelligence

IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS

IndexTTS2 introduces a novel auto-regressive zero-shot text-to-speech model that achieves precise duration control and fine-grained emotional expression through a universal time‑encoding mechanism, decoupled voice‑style and emotion modeling, and a GPT‑style latent feature, outperforming state‑of‑the‑art baselines across multiple benchmarks.

Text‑to‑Speechduration controlemotional synthesis

0 likes · 23 min read

IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS

Cognitive Technology Team

Jul 1, 2025 · Artificial Intelligence

How We Built a Live‑Streaming TTS Engine: From Data Pipelines to AI Voice Generation

This article presents a comprehensive practice summary of building an intelligent digital‑human system, covering six core modules—LLM content generation, LLM interaction, TTS synthesis, visual driving, audio‑video engineering, and backend services—while detailing data collection, signal processing, ASR annotation, speaker clustering, model optimization (V1‑V4), evaluation metrics, and future research directions.

AI voiceAudio ProcessingLLM

0 likes · 23 min read

How We Built a Live‑Streaming TTS Engine: From Data Pipelines to AI Voice Generation

ShiZhen AI

May 13, 2025 · Artificial Intelligence

Top Free Text‑to‑Speech Tools for Content Creators

This article reviews five free text‑to‑speech solutions—AI易视频, Google TTS, Natural Reader, Balabolka, and Speech2Go—detailing their features, language support, installation needs, and unique capabilities to help creators choose the right tool for narration, translation, or multi‑character audio production.

AITTSText‑to‑Speech

0 likes · 7 min read

Top Free Text‑to‑Speech Tools for Content Creators

DataFunTalk

Mar 21, 2025 · Artificial Intelligence

OpenAI Unveils New STT and TTS Models: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts – Performance, Pricing, and Demo

OpenAI announced three new speech models—two STT models (gpt-4o-transcribe and its lightweight gpt-4o-mini-transcribe) and one TTS model (gpt-4o-mini-tts)—showcasing strong accuracy on multilingual benchmarks, competitive pricing, and a quick‑start API demo for developers.

AI modelsGPT-4oOpenAI

0 likes · 8 min read

OpenAI Unveils New STT and TTS Models: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts – Performance, Pricing, and Demo

Python Programming Learning Circle

Mar 20, 2025 · Artificial Intelligence

Building a Python Voice Synthesis System Using Xunfei WebAPI

This tutorial explains how to create a Python-based speech synthesis tool by installing required packages, configuring Xunfei Open Platform credentials, implementing a Tkinter GUI, and using WebSocket communication to convert text into audio with selectable voice profiles.

GUISpeech synthesisText‑to‑Speech

0 likes · 8 min read

Building a Python Voice Synthesis System Using Xunfei WebAPI

Test Development Learning Exchange

Jan 13, 2025 · Artificial Intelligence

Python Tool for Converting English Videos to Chinese Dubbed Videos with Subtitles

This article provides a comprehensive guide on developing a Python tool to convert English videos into versions with Chinese dubbing and subtitles, covering all steps from audio extraction to final synthesis.

AI toolsFFmpegMachine Translation

0 likes · 5 min read

Python Tool for Converting English Videos to Chinese Dubbed Videos with Subtitles

Huolala Tech

Dec 26, 2024 · Artificial Intelligence

How Huolala’s In‑House TTS Overcomes Latency, Naturalness, and Multilingual Limits

This article details Huolala’s self‑developed Text‑to‑Speech system, outlining its architecture, the challenges of latency, naturalness, and language support, and the innovative solutions—including streaming synthesis, emotion modeling, and transfer‑learning‑based multilingual capabilities—that deliver more flexible and realistic voice interactions.

Emotion ModelingStreaming TTSText‑to‑Speech

0 likes · 10 min read

How Huolala’s In‑House TTS Overcomes Latency, Naturalness, and Multilingual Limits

Alibaba Cloud Developer

Nov 29, 2024 · Artificial Intelligence

Deploy GPT‑SoVITS for Text‑to‑Speech on Alibaba Cloud Function Compute – Step‑by‑Step Guide

This guide walks you through deploying the GPT‑SoVITS text‑to‑speech model on Alibaba Cloud Function Compute, covering application creation, quick voice synthesis, advanced model fine‑tuning, NAS file management, and optional promotional tasks for earning rewards.

Function ComputeGPT-SoVITSText‑to‑Speech

0 likes · 12 min read

Deploy GPT‑SoVITS for Text‑to‑Speech on Alibaba Cloud Function Compute – Step‑by‑Step Guide

System Architect Go

Nov 28, 2024 · Artificial Intelligence

An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning

This article explains how modern AI advances have transformed audio processing, covering digital audio fundamentals, automatic speech recognition (ASR), text‑to‑speech (TTS), voice cloning techniques, and provides practical Python code examples using OpenAI Whisper and HuggingFace TTS models.

AIAudio ProcessingText‑to‑Speech

0 likes · 7 min read

An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning

Huolala Tech

Oct 8, 2024 · Mobile Development

iOS 17 Text‑to‑Speech Crash: Root Cause and Effective Fixes

This article investigates a recurring text‑to‑speech crash on iOS 17 devices, detailing the EXC_BAD_ACCESS error, analyzing stack traces, exploring internal AVAudioEngine and AUAudioUnit_XPC structures, and presenting two remediation strategies—including a hook‑based approach that safely bypasses problematic dealloc and stop calls.

AVAudioEngineCrashHook

0 likes · 16 min read

iOS 17 Text‑to‑Speech Crash: Root Cause and Effective Fixes

ShiZhen AI

Jun 5, 2024 · Artificial Intelligence

How the Homegrown Open‑Source ChatTTS Model Scored 20K Stars in One Week

The article introduces ChatTTS, a dialogue‑optimized open‑source text‑to‑speech model trained on over 100,000 hours of Chinese and English data, highlights its fine‑grained prosody control and multi‑speaker support, notes its superior naturalness compared to most open‑source TTS systems, and outlines its current limitations such as poor Arabic numeral handling and slow inference speed.

ChatTTSChinese AIText‑to‑Speech

0 likes · 2 min read

How the Homegrown Open‑Source ChatTTS Model Scored 20K Stars in One Week

Rare Earth Juejin Tech Community

Nov 19, 2023 · Artificial Intelligence

Transformers.js 2.7.0 Adds Text‑to‑Speech Support and Demo Application

The new Transformers.js 2.7.0 release introduces text‑to‑speech capabilities, provides a simple browser demo, explains how to save audio with the wavefile NPM package, offers speaker selection from a large CMU Arctic dataset, and lists additional library updates.

AIJavaScriptText‑to‑Speech

0 likes · 3 min read

Transformers.js 2.7.0 Adds Text‑to‑Speech Support and Demo Application

php Courses

Sep 1, 2023 · Artificial Intelligence

Integrating Baidu Text-to-Speech API with PHP

This tutorial demonstrates how to obtain Baidu TTS credentials, construct the required signature, send an HTTP request using PHP's cURL library, and save the returned audio data as an MP3 file, providing a complete code example for developers.

API integrationBaidu TTSPHP

0 likes · 5 min read

Integrating Baidu Text-to-Speech API with PHP

58 Tech

Aug 25, 2023 · Artificial Intelligence

Voice Cloning Technology in AI Sales Assistant

This article introduces the AI sales assistant from 58.com, detailing its background, a few‑shot voice cloning approach using real dialogue data, multi‑accent naturalness optimization, deployment architecture, and future plans, while evaluating performance metrics and discussing challenges in speech synthesis quality and stability.

AI sales assistantSpeech synthesisText‑to‑Speech

0 likes · 19 min read

Voice Cloning Technology in AI Sales Assistant

Python Crawling & Data Mining

Oct 18, 2022 · Fundamentals

How to Make Python Speak with a Male Voice Using pyttsx3 and Registry Tweaks

This article walks through troubleshooting the pyttsx3 Python text‑to‑speech library on Windows, explains why only female voices appear by default, shows how to add the missing male “Kangkang” voice via registry edits, and provides complete working code examples for both voice selection and speech synthesis.

Programming TutorialText‑to‑SpeechWindows Registry

0 likes · 5 min read

How to Make Python Speak with a Male Voice Using pyttsx3 and Registry Tweaks

Xiaohongshu Tech REDtech

Aug 10, 2022 · Artificial Intelligence

Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)

The MSMC‑TTS system, a multi‑stage multi‑codebook VQ‑VAE based neural text‑to‑speech solution, delivers near‑human audio quality (MOS 4.41) with a compact 3.12 MB acoustic model, substantially surpassing Mel‑Spectrogram FastSpeech baselines in naturalness and efficiency.

Compact RepresentationMulti-Stage ModelingSpeech synthesis

0 likes · 10 min read

Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)

Sohu Tech Products

Jul 20, 2022 · Mobile Development

Building a Mobile Paper‑Reading App with OpenCV OCR and Text‑to‑Speech

A middle‑aged Android developer recounts breaking his child's "Niu Ting Ting" device, then details how he recreated its functionality by integrating OpenCV‑based paper detection, OCR, and TTS into a mobile app, complete with code snippets and performance results.

AndroidImage processingMobile Development

0 likes · 14 min read

Building a Mobile Paper‑Reading App with OpenCV OCR and Text‑to‑Speech

Python Programming Learning Circle

Apr 22, 2022 · Artificial Intelligence

Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke

This tutorial explains how to create a Python voice chatbot by recording audio, converting speech to text with Baidu AI, sending the text to the Qingyunke chatbot API for a response, and finally synthesizing the reply back into speech using pyttsx3.

Baidu AIChatbotText‑to‑Speech

0 likes · 8 min read

Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke

Python Programming Learning Circle

Apr 4, 2022 · Artificial Intelligence

Building a Simple Speech Synthesis System with iFlytek WebAPI in Python

This tutorial explains how to create a lightweight speech synthesis tool using iFlytek's WebAPI, covering required environment setup, API credential acquisition, GUI design with Tkinter, and detailed Python code for WebSocket communication, audio handling, and WAV file generation.

Audio ProcessingPythonSpeech synthesis

0 likes · 8 min read

Building a Simple Speech Synthesis System with iFlytek WebAPI in Python

DataFunTalk

Mar 26, 2022 · Artificial Intelligence

Advances in Alibaba's Digital Human (XiaoMi) Technology: Development, Construction, and Interaction

This article reviews Alibaba's XiaoMi digital human technology, covering its evolution since 2019, a six‑stage pipeline for building avatars, methods to enhance emotional, textual, vocal, and motion expressiveness, and approaches for improving long‑term interactive capabilities such as controllable script generation, multimodal QA, sign‑language translation, and intelligent behavior decision, culminating in the release of the MMTK multimodal algorithm library.

Emotion ModelingMultimodal AIText‑to‑Speech

0 likes · 11 min read

Advances in Alibaba's Digital Human (XiaoMi) Technology: Development, Construction, and Interaction

The Dominant Programmer

Feb 10, 2022 · Frontend Development

How to Add Click‑Triggered Text‑to‑Speech in Vue with speak‑tts

This guide shows how to integrate the speak‑tts library into a Vue component to trigger speech synthesis on button click, covering Chrome’s autoplay restrictions, npm installation, object initialization, button markup, click handler, and a complete working example.

JavaScriptText‑to‑SpeechVue

0 likes · 4 min read

How to Add Click‑Triggered Text‑to‑Speech in Vue with speak‑tts

Test Development Learning Exchange

Oct 17, 2021 · Artificial Intelligence

Using pyttsx3 for Text-to-Speech in Python

This article provides a hands‑on guide to using the pyttsx3 library for offline text‑to‑speech conversion in Python, covering installation, basic playback, voice property adjustments, multilingual support, and conditional speech examples with counters.

PythonSpeech synthesisText‑to‑Speech

0 likes · 7 min read

Using pyttsx3 for Text-to-Speech in Python

Python Crawling & Data Mining

Nov 28, 2020 · Artificial Intelligence

How to Convert Text to Speech in Python with 5 Powerful TTS Libraries

This guide walks you through installing, configuring, and using five Python text‑to‑speech libraries—gTTS, Baidu AIP, pyttsx3, pywin32, and speech—to generate personalized audio files, adjust voice properties, and automate playback.

PythonSpeech synthesisText‑to‑Speech

0 likes · 5 min read

How to Convert Text to Speech in Python with 5 Powerful TTS Libraries

The Dominant Programmer

Nov 25, 2020 · Frontend Development

How to Use the HTML5 SpeechSynthesis API for Browser Text‑to‑Speech

This guide shows how to create a simple web page that converts a given text string into spoken audio using the HTML5 SpeechSynthesis API, with a complete code example, execution steps, and browser compatibility notes.

HTML5JavaScriptSpeechSynthesis

0 likes · 3 min read

How to Use the HTML5 SpeechSynthesis API for Browser Text‑to‑Speech

MaGe Linux Operations

Nov 26, 2019 · Artificial Intelligence

Create Cute Voiceovers with Baidu TTS and Python

This guide shows how to use Baidu's AI speech synthesis service with Python, covering SDK installation, app creation, obtaining credentials, and sample code to convert text—including daily quotes from an external API—into audio files, even customizing voice styles.

APIBaidu AIPython

0 likes · 5 min read

Create Cute Voiceovers with Baidu TTS and Python

DataFunTalk

Nov 5, 2019 · Artificial Intelligence

Low-Resource Text-to-Speech: FastSpeech, LightTTS, and LightBERT Overview

This article reviews recent advances in low‑resource text‑to‑speech synthesis, covering the background of TTS, challenges in data‑ and compute‑limited scenarios, and detailed descriptions of FastSpeech, LightTTS, LightBERT, and related lightweight vocoder techniques, along with experimental results and future research directions.

FastSpeechLightTTSText‑to‑Speech

0 likes · 20 min read

Low-Resource Text-to-Speech: FastSpeech, LightTTS, and LightBERT Overview

Liangxu Linux

Sep 3, 2019 · Artificial Intelligence

Clone Any Voice in Seconds with the Real-Time-Voice-Cloning Open‑Source TTS

This guide explains how the Real-Time-Voice-Cloning project uses deep‑learning text‑to‑speech techniques to generate a voice clone from a short audio sample, covering the underlying principle, required dataset, setup steps, demo usage, and ethical considerations.

Real-Time-Voice-CloningText‑to‑SpeechVoice Cloning

0 likes · 5 min read

Clone Any Voice in Seconds with the Real-Time-Voice-Cloning Open‑Source TTS

21CTO

Dec 9, 2015 · Artificial Intelligence

iFLY Mobile Speech Platform: Enabling Voice Recognition and Synthesis

iFLY’s Mobile Speech Platform (MSP) integrates cloud‑based speech recognition and text‑to‑speech technologies to deliver high‑quality, multi‑channel voice services for Android, iOS and other devices, detailing its four‑layer architecture, core functionalities, and the role of ASR and TTS in modern human‑machine interaction.

Mobile DevelopmentText‑to‑Speechartificial-intelligence

0 likes · 5 min read

iFLY Mobile Speech Platform: Enabling Voice Recognition and Synthesis

Baidu Tech Salon

Jul 29, 2014 · Artificial Intelligence

Baidu Speech Synthesis: Balancing Trade‑offs and Opening the Platform to Developers

Baidu’s speech synthesis system, developed since 2013 to give machines natural Chinese voices, tackles millions of tonal variations through phonetic compression and optimized acoustic models, balances trade‑offs in data and scalability, and offers a free open platform that lets developers integrate high‑quality text‑to‑speech into apps, advancing equal access to information.

BaiduDeveloper PlatformHMM

0 likes · 6 min read

Baidu Speech Synthesis: Balancing Trade‑offs and Opening the Platform to Developers