How to Integrate a Multi‑Engine TTS Fusion Service for Stable High‑Quality Speech
This guide explains the challenges of using disparate TTS providers, introduces a unified multi‑engine speech synthesis service, details its technical highlights, typical use cases, and provides complete API specifications with request/response examples and authentication steps.
Background
Text‑to‑speech (TTS) is a core capability in intelligent customer service, content creation, education, media broadcasting, and government explanation. Existing vendor services differ in voice naturalness, latency, long‑text stability, and style, causing unstable synthesis, inconsistent voice quality, and high integration cost.
Product Overview
The TTS Fusion Service integrates three engines – Alibaba Cloud TTS, ByteDance premium long‑text + large‑model asynchronous TTS, and a self‑developed low‑latency engine – and provides intelligent scheduling, stable output, and a unified API.
Technical Highlights
Multi‑engine fusion algorithm: the same text is submitted to multiple TTS engines in parallel and the best result is automatically selected.
High‑concurrency, low‑latency architecture: microservice design with distributed queues.
Long‑text optimization: comma‑based sentence splitting and large‑model support for emotional expression and context understanding.
Typical Scenarios
Intelligent customer‑service voice broadcast
Audiobook / knowledge‑paid content production
Digital human or virtual anchor dubbing
Government explanation and public service
API Design
Create Asynchronous Task (/tts/async)
Method: POST
Headers: Content-Type: application/json, Authorization: Bearer <token> JSON body parameters: text (string, required): text to synthesize. voice (string, optional, default system voice): speaker. format (string, required): output audio format (e.g., wav, mp3). sample_rate (int, required): audio sample rate (e.g., 16000). volume (int, optional, default 100): volume 0‑100. speech_rate (int, optional, default 0): speech speed –100‑100. pitch_rate (int, optional, default 0): pitch –100‑100. enable_subtitle (bool, optional, default false): return subtitle per sentence. enable_notify (bool, optional, default false): enable asynchronous callback. notify_url (string, required if enable_notify is true): callback URL. comma_flag (bool, optional, default false): enable comma‑based sentence splitting. model_flag (bool, optional, default false): select ByteDance large‑model or premium long‑text.
Example request:
curl -X POST 'http://localhost:8080/tts/async' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer xxxx-1" \
--data '{
"text": "今天天气好晴朗",
"voice": "微软-磁性男声",
"format": "wav",
"sample_rate": 16000,
"enable_subtitle": true,
"enable_notify": false,
"speech_rate": 0
}'Example response:
{
"data": {"task_id":"b686a398866742498d4ea835143f5174"},
"error_code":20000000,
"error_message":"SUCCESS",
"request_id":"ce55760d-43c7-4133-9478-ca6d744fd517",
"status":200
}Query Task (/tts/query)
Method: GET
Query parameters: request_id (string, required), task_id (string, required).
Example request:
curl -X GET 'http://localhost:8080/tts/query?request_id=ce55760d-...&task_id=b686a398...' \
--header "Authorization: Bearer xxx-1"Example response (includes audio download address and sentence‑level timing):
{
"data":{
"audio_address":"http://.../audio.wav",
"sentences":[{"id":0,"text":"今天天气好晴朗","begin_time":170,"end_time":1795}]
},
"error_code":20000000,
"error_message":"SUCCESS",
"pod_ip":"11.70.176.21",
"request_id":"tmp",
"status":200
}Calling Notes
The Authorization header value determines the engine channel: -1 for the self‑built engine, -2 for Alibaba Cloud, -5 for ByteDance (controlled by model_flag).
Use the returned task_id to poll the query endpoint until the task completes, then download the audio from audio_address.
When enable_subtitle is true, the response contains a sentences array with per‑sentence text and timestamps, useful for caption display or video alignment.
API Key Acquisition
An API Key/Token is required for authentication. Steps:
Open the API Marketplace at https://zyun.360.cn/product/apimarket.
Locate the audio service and create an application under the speech synthesis section.
Obtain the API Key from the application details.
Include the header Authorization: Bearer <Your-API-Key> in all API calls.
Collaboration
The TTS Fusion Service has completed core functionality and is open for pilot integration across industries. Partners can use the standard API to integrate and extend speech synthesis capabilities.
360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
