Tag

text-to-speech

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Mar 21, 2025 · Artificial Intelligence

OpenAI Unveils New STT and TTS Models: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts – Performance, Pricing, and Demo

OpenAI announced three new speech models—two STT models (gpt-4o-transcribe and its lightweight gpt-4o-mini-transcribe) and one TTS model (gpt-4o-mini-tts)—showcasing strong accuracy on multilingual benchmarks, competitive pricing, and a quick‑start API demo for developers.

AI modelsGPT-4oOpenAI
0 likes · 8 min read
OpenAI Unveils New STT and TTS Models: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts – Performance, Pricing, and Demo
Python Programming Learning Circle
Python Programming Learning Circle
Mar 20, 2025 · Artificial Intelligence

Building a Python Voice Synthesis System Using Xunfei WebAPI

This tutorial explains how to create a Python-based speech synthesis tool by installing required packages, configuring Xunfei Open Platform credentials, implementing a Tkinter GUI, and using WebSocket communication to convert text into audio with selectable voice profiles.

GUISpeech SynthesisWebSocket
0 likes · 8 min read
Building a Python Voice Synthesis System Using Xunfei WebAPI
System Architect Go
System Architect Go
Nov 28, 2024 · Artificial Intelligence

An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning

This article explains how modern AI advances have transformed audio processing, covering digital audio fundamentals, automatic speech recognition (ASR), text‑to‑speech (TTS), voice cloning techniques, and provides practical Python code examples using OpenAI Whisper and HuggingFace TTS models.

AIAudio ProcessingSpeech Recognition
0 likes · 7 min read
An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 19, 2023 · Artificial Intelligence

Transformers.js 2.7.0 Adds Text‑to‑Speech Support and Demo Application

The new Transformers.js 2.7.0 release introduces text‑to‑speech capabilities, provides a simple browser demo, explains how to save audio with the wavefile NPM package, offers speaker selection from a large CMU Arctic dataset, and lists additional library updates.

AIDemoJavaScript
0 likes · 3 min read
Transformers.js 2.7.0 Adds Text‑to‑Speech Support and Demo Application
php中文网 Courses
php中文网 Courses
Sep 1, 2023 · Artificial Intelligence

Integrating Baidu Text-to-Speech API with PHP

This tutorial demonstrates how to obtain Baidu TTS credentials, construct the required signature, send an HTTP request using PHP's cURL library, and save the returned audio data as an MP3 file, providing a complete code example for developers.

API IntegrationBaidu TTSPHP
0 likes · 5 min read
Integrating Baidu Text-to-Speech API with PHP
58 Tech
58 Tech
Aug 25, 2023 · Artificial Intelligence

Voice Cloning Technology in AI Sales Assistant

This article introduces the AI sales assistant from 58.com, detailing its background, a few‑shot voice cloning approach using real dialogue data, multi‑accent naturalness optimization, deployment architecture, and future plans, while evaluating performance metrics and discussing challenges in speech synthesis quality and stability.

AI sales assistantSpeech Synthesisfew-shot learning
0 likes · 19 min read
Voice Cloning Technology in AI Sales Assistant
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Aug 10, 2022 · Artificial Intelligence

Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)

The MSMC‑TTS system, a multi‑stage multi‑codebook VQ‑VAE based neural text‑to‑speech solution, delivers near‑human audio quality (MOS 4.41) with a compact 3.12 MB acoustic model, substantially surpassing Mel‑Spectrogram FastSpeech baselines in naturalness and efficiency.

Compact RepresentationMulti-Stage ModelingNeural TTS
0 likes · 10 min read
Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)
Sohu Tech Products
Sohu Tech Products
Jul 20, 2022 · Mobile Development

Building a Mobile Paper‑Reading App with OpenCV OCR and Text‑to‑Speech

A middle‑aged Android developer recounts breaking his child's "Niu Ting Ting" device, then details how he recreated its functionality by integrating OpenCV‑based paper detection, OCR, and TTS into a mobile app, complete with code snippets and performance results.

AndroidOCRimage processing
0 likes · 14 min read
Building a Mobile Paper‑Reading App with OpenCV OCR and Text‑to‑Speech
Python Programming Learning Circle
Python Programming Learning Circle
Apr 22, 2022 · Artificial Intelligence

Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke

This tutorial explains how to create a Python voice chatbot by recording audio, converting speech to text with Baidu AI, sending the text to the Qingyunke chatbot API for a response, and finally synthesizing the reply back into speech using pyttsx3.

ChatbotSpeech Recognitionbaidu-ai
0 likes · 8 min read
Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke
Python Programming Learning Circle
Python Programming Learning Circle
Apr 4, 2022 · Artificial Intelligence

Building a Simple Speech Synthesis System with iFlytek WebAPI in Python

This tutorial explains how to create a lightweight speech synthesis tool using iFlytek's WebAPI, covering required environment setup, API credential acquisition, GUI design with Tkinter, and detailed Python code for WebSocket communication, audio handling, and WAV file generation.

Audio ProcessingPythonSpeech Synthesis
0 likes · 8 min read
Building a Simple Speech Synthesis System with iFlytek WebAPI in Python
DataFunTalk
DataFunTalk
Mar 26, 2022 · Artificial Intelligence

Advances in Alibaba's Digital Human (XiaoMi) Technology: Development, Construction, and Interaction

This article reviews Alibaba's XiaoMi digital human technology, covering its evolution since 2019, a six‑stage pipeline for building avatars, methods to enhance emotional, textual, vocal, and motion expressiveness, and approaches for improving long‑term interactive capabilities such as controllable script generation, multimodal QA, sign‑language translation, and intelligent behavior decision, culminating in the release of the MMTK multimodal algorithm library.

Digital HumanEmotion ModelingInteractive AI
0 likes · 11 min read
Advances in Alibaba's Digital Human (XiaoMi) Technology: Development, Construction, and Interaction
Test Development Learning Exchange
Test Development Learning Exchange
Oct 17, 2021 · Artificial Intelligence

Using pyttsx3 for Text-to-Speech in Python

This article provides a hands‑on guide to using the pyttsx3 library for offline text‑to‑speech conversion in Python, covering installation, basic playback, voice property adjustments, multilingual support, and conditional speech examples with counters.

PythonSpeech Synthesisconditional speech
0 likes · 7 min read
Using pyttsx3 for Text-to-Speech in Python
DataFunTalk
DataFunTalk
Nov 5, 2019 · Artificial Intelligence

Low-Resource Text-to-Speech: FastSpeech, LightTTS, and LightBERT Overview

This article reviews recent advances in low‑resource text‑to‑speech synthesis, covering the background of TTS, challenges in data‑ and compute‑limited scenarios, and detailed descriptions of FastSpeech, LightTTS, LightBERT, and related lightweight vocoder techniques, along with experimental results and future research directions.

Artificial IntelligenceFastSpeechLightTTS
0 likes · 20 min read
Low-Resource Text-to-Speech: FastSpeech, LightTTS, and LightBERT Overview
Baidu Tech Salon
Baidu Tech Salon
Jul 29, 2014 · Artificial Intelligence

Baidu Speech Synthesis: Balancing Trade‑offs and Opening the Platform to Developers

Baidu’s speech synthesis system, developed since 2013 to give machines natural Chinese voices, tackles millions of tonal variations through phonetic compression and optimized acoustic models, balances trade‑offs in data and scalability, and offers a free open platform that lets developers integrate high‑quality text‑to‑speech into apps, advancing equal access to information.

Artificial IntelligenceBaiduHMM
0 likes · 6 min read
Baidu Speech Synthesis: Balancing Trade‑offs and Opening the Platform to Developers