Tag

TTS

0 views collected around this technical thread.

Amap Tech
Amap Tech
May 27, 2025 · Artificial Intelligence

Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment

This article explains how Gaode Map leverages lightweight edge TTS models, dual‑autoregressive large‑model data augmentation, and a configurable audio‑processing DAG to enable users to create highly realistic personalized voice packs from just three recorded sentences.

Gaode MapsTTSdata augmentation
0 likes · 8 min read
Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment
DaTaobao Tech
DaTaobao Tech
Mar 31, 2025 · Artificial Intelligence

AI Audio Generation and Voice Synthesis Practices at Taobao

The article surveys Taobao’s AI‑generated audio pipeline, detailing eight technical papers on image‑to‑video, OpenAI o1, multimodal video, and large‑model voice synthesis, while highlighting advances like VALL‑E, CosyVoice, F5‑TTS, data‑cleaning methods, and e‑commerce applications such as voice‑cloned live streams, multilingual TTS, AI video‑audio integration, and audiobook production.

AI audioTTSdata cleaning
0 likes · 11 min read
AI Audio Generation and Voice Synthesis Practices at Taobao
DataFunTalk
DataFunTalk
Jun 3, 2024 · Artificial Intelligence

Deploying Speech AI Services Quickly with NVIDIA Riva

This article explains how to use NVIDIA Riva to rapidly deploy speech AI services, covering Riva's overview, Chinese ASR model updates, TTS capabilities, customization options, the Quickstart tool, and a Q&A session that clarifies deployment, model fine‑tuning, and integration with NeMo and Triton.

ASRGPU AccelerationNVIDIA Riva
0 likes · 13 min read
Deploying Speech AI Services Quickly with NVIDIA Riva
DataFunTalk
DataFunTalk
Feb 13, 2024 · Artificial Intelligence

An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training

This article introduces NVIDIA’s open‑source NeMo framework, detailing its PyTorch‑based architecture for Speech AI, ASR and TTS training, NLP and LLM support, GPU‑optimized parallelism, pre‑trained model resources, fine‑tuning techniques, and the accompanying NeMo Aligner and Framework tools.

ASRNVIDIA NeMoPyTorch
0 likes · 18 min read
An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training
DataFunTalk
DataFunTalk
Jan 26, 2024 · Artificial Intelligence

Efficient Deployment of Speech AI Models on GPUs

This article explains how to efficiently deploy speech AI models—including ASR and TTS—on GPUs using NVIDIA's Triton Inference Server and TensorRT, covering background challenges, GPU‑based solutions, decoding optimizations, Whisper acceleration with TensorRT‑LLM, streaming TTS improvements, voice‑cloning pipelines, future plans, and a Q&A session.

ASRGPUInference
0 likes · 20 min read
Efficient Deployment of Speech AI Models on GPUs
HomeTech
HomeTech
Dec 6, 2023 · Artificial Intelligence

Metaverse-Based Virtual Humans: Technologies and Applications in Intelligent Q&A

This article explores the concept of the metaverse and virtual humans, detailing 3D modeling techniques, NLP-driven language understanding, streaming TTS, VR/AR interaction, AIGC content generation, and the deployment of a large‑model intelligent Q&A system with real‑time facial expression synthesis for virtual anchors.

3D ModelingAIGCArtificial Intelligence
0 likes · 8 min read
Metaverse-Based Virtual Humans: Technologies and Applications in Intelligent Q&A
DataFunSummit
DataFunSummit
Aug 15, 2023 · Artificial Intelligence

AI Sales Assistant: Few‑Shot Voice Cloning and Multi‑Accent Naturalness Optimization

The article presents 58 Tongcheng AI Lab's AI sales assistant, detailing its background, a few‑shot voice‑cloning pipeline built on real dialogue data, data preprocessing, FastSpeech2‑based acoustic modeling, multi‑accent style transfer, deployment architecture, controllable synthesis parameters, and future research directions.

AI sales assistantFastSpeech2Speech Synthesis
0 likes · 20 min read
AI Sales Assistant: Few‑Shot Voice Cloning and Multi‑Accent Naturalness Optimization
DataFunSummit
DataFunSummit
Apr 18, 2023 · Artificial Intelligence

Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT

This article presents comprehensive best‑practice guidelines for deploying conversational speech AI—including ASR and TTS pipelines—on GPU servers using NVIDIA Triton Inference Server and TensorRT, covering workflow overview, performance optimizations, streaming inference, and real‑world deployment tips.

ASRGPU DeploymentSpeech AI
0 likes · 14 min read
Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT
DataFunTalk
DataFunTalk
Jul 7, 2022 · Artificial Intelligence

Huawei Translation’s Achievements and Technical Solutions in IWSLT 2022 Speech Translation Tasks

This article reviews Huawei Translation’s top-ranking results in the IWSLT 2022 speech translation competition across speech‑to‑speech, offline speech‑to‑text, and length‑controlled translation tasks, and details their cascade and end‑to‑end technical approaches, including domain‑controlled ASR, context‑aware MT re‑ranking, and VITS‑based TTS.

ASRHuaweiIWSLT
0 likes · 13 min read
Huawei Translation’s Achievements and Technical Solutions in IWSLT 2022 Speech Translation Tasks
Yuewen Technology
Yuewen Technology
Oct 15, 2021 · Artificial Intelligence

How Yuedu's TTS Platform Automates High‑Quality Audiobook Production

This article explains how Yuedu's TTS synthesis platform tackles the booming audiobook market by using AI‑driven text preprocessing, role graph construction, content structuring, emotion and effect recognition, and a streamlined post‑processing workflow to efficiently generate multi‑character, emotionally rich audio books at scale.

AudiobookEmotion RecognitionNLP
0 likes · 13 min read
How Yuedu's TTS Platform Automates High‑Quality Audiobook Production
58 Tech
58 Tech
Dec 28, 2020 · Backend Development

Implementation of SIP‑Based DTMF Signal Capture for Intelligent Voice Robots

This article explains how an intelligent voice robot leverages TTS and SIP to convert server alerts into spoken notifications, detailing the end‑to‑end workflow, DTMF transmission methods, SIP detection techniques, SDP media negotiation, and RTP‑based DTMF parsing to enable reliable key‑press handling.

DTMFRTPSIP
0 likes · 8 min read
Implementation of SIP‑Based DTMF Signal Capture for Intelligent Voice Robots
360 Quality & Efficiency
360 Quality & Efficiency
May 10, 2019 · Artificial Intelligence

Smart Speaker Voice Interaction Platform: Concepts, Processes, and Testing Metrics

This article introduces the architecture of smart speaker voice interaction systems, covering wake‑word activation, automatic speech recognition (ASR), natural language understanding (NLU), skill processing, text‑to‑speech synthesis (TTS), and the key performance and testing metrics for each component.

ASRNLUTTS
0 likes · 11 min read
Smart Speaker Voice Interaction Platform: Concepts, Processes, and Testing Metrics
Tencent Cloud Developer
Tencent Cloud Developer
Feb 26, 2019 · Artificial Intelligence

Tencent Cloud Intelligent Speech Technology: Development, Challenges and Practical Applications

Tencent Cloud's intelligent speech platform combines high‑accuracy ASR, advanced WaveNet‑based TTS, and solutions for noise, far‑field, and dialect challenges, enabling voice input, transcription, and customer‑service bots, with real‑world deployments in finance, museums, hotels, and other industry scenarios.

ASRHuman-Computer InteractionNatural Language Processing
0 likes · 8 min read
Tencent Cloud Intelligent Speech Technology: Development, Challenges and Practical Applications
Tencent Cloud Developer
Tencent Cloud Developer
Sep 30, 2018 · Artificial Intelligence

Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress

The article surveys Tencent’s recent advances in smart‑speaker voice interaction, detailing a full technology chain—from front‑end capture, wake‑up and enhancement, through speaker verification and short‑speech voiceprint, to TDNN/LSTM speech recognition, target speaker extraction, and end‑to‑end attention modeling for robust, personalized performance.

Speech RecognitionTTSattention mechanism
0 likes · 18 min read
Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress