Tagged articles

speech AI

14 articles · Page 1 of 1

Jun 4, 2026 · Artificial Intelligence

Bridging the Speech Modality Gap with Domain Knowledge Enhancement

The article analyzes recent end‑to‑end speech models, compares four knowledge‑enhancement architectures, evaluates their technical mechanisms, pros and cons, and outlines how these approaches can be applied to the insurance and finance sectors to build real‑time, domain‑aware voice agents.

S2S architecturedomain fine‑tuninginsurance AI

0 likes · 12 min read

Bridging the Speech Modality Gap with Domain Knowledge Enhancement

Weekly Large Model Application

May 5, 2026 · Artificial Intelligence

Task Alignment: How to Give Your Speech Model a Job Handbook

The article explains how to transform a pretrained speech model into a product‑ready assistant by defining demonstration data, clarifying team debates on persona, safety, and length, contrasting alignment with pretraining, and highlighting common pitfalls to avoid during deployment.

Dialogue SystemsSafetydata annotation

0 likes · 6 min read

Task Alignment: How to Give Your Speech Model a Job Handbook

Weekly Large Model Application

May 5, 2026 · Artificial Intelligence

What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram

The article distinguishes two meanings of “end‑to‑end,” then outlines four sequential stages—defining data and scenario, massive pre‑training on audio‑text pairs, task alignment via instruction or supervised fine‑tuning, and optional preference tuning—to guide engineers in building usable speech assistants.

audio dataend-to-end modelsinstruction fine-tuning

0 likes · 6 min read

What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram

Weekly Large Model Application

May 5, 2026 · Artificial Intelligence

Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives

The article clarifies the dual meaning of “end‑to‑end” in speech AI—product simplicity and engineering unification—then outlines six emerging trends, from real‑time conversational latency to multilingual robustness, token‑based audio pipelines, voice‑specific security, edge privacy, and the growing importance of data quality and reproducibility.

End-to-EndReal-time Interactionedge computing

0 likes · 8 min read

Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives

AI Explorer

Apr 3, 2026 · Artificial Intelligence

VibeVoice: Open‑Source Real‑Time TTS and 60‑Minute ASR from Microsoft

VibeVoice is a Microsoft‑backed open‑source framework that combines streaming text‑to‑speech and ultra‑long audio speech‑to‑text capabilities, offering multilingual models, low‑latency generation, speaker diarization, and easy deployment via Hugging Face, positioning it as a commercial‑grade alternative for developers.

Hugging FaceMicrosoftlong-form ASR

0 likes · 7 min read

VibeVoice: Open‑Source Real‑Time TTS and 60‑Minute ASR from Microsoft

Weekly Large Model Application

Feb 20, 2026 · Artificial Intelligence

Intelligent Speech vs. Voice Agent: Key Differences and How They Relate

This article explains the technical distinction between intelligent speech— a toolbox of ASR, TTS, NLU, and NLG technologies— and Voice Agent, an end‑to‑end conversational system built on those tools and large‑model reasoning, illustrating their layered relationship, functional gaps, and typical use cases.

ASRDialogue SystemsLarge Language Model

0 likes · 7 min read

Intelligent Speech vs. Voice Agent: Key Differences and How They Relate

DataFunTalk

Jun 3, 2024 · Artificial Intelligence

Deploying Speech AI Services Quickly with NVIDIA Riva

This article explains how to use NVIDIA Riva to rapidly deploy speech AI services, covering Riva's overview, Chinese ASR model updates, TTS capabilities, customization options, the Quickstart tool, and a Q&A session that clarifies deployment, model fine‑tuning, and integration with NeMo and Triton.

ASRGPU AccelerationNVIDIA Riva

0 likes · 13 min read

Deploying Speech AI Services Quickly with NVIDIA Riva

DataFunTalk

Feb 13, 2024 · Artificial Intelligence

An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training

This article introduces NVIDIA’s open‑source NeMo framework, detailing its PyTorch‑based architecture for Speech AI, ASR and TTS training, NLP and LLM support, GPU‑optimized parallelism, pre‑trained model resources, fine‑tuning techniques, and the accompanying NeMo Aligner and Framework tools.

ASRNVIDIA NeMoPyTorch

0 likes · 18 min read

An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training

DataFunTalk

Jan 26, 2024 · Artificial Intelligence

Efficient Deployment of Speech AI Models on GPUs

This article explains how to efficiently deploy speech AI models—including ASR and TTS—on GPUs using NVIDIA's Triton Inference Server and TensorRT, covering background challenges, GPU‑based solutions, decoding optimizations, Whisper acceleration with TensorRT‑LLM, streaming TTS improvements, voice‑cloning pipelines, future plans, and a Q&A session.

ASRGPUTTS

0 likes · 20 min read

Efficient Deployment of Speech AI Models on GPUs

DataFunSummit

May 4, 2023 · Artificial Intelligence

An Overview of NVIDIA NeMo for Speech AI: ASR Training, Chinese Support, and Related Applications

This article provides a comprehensive introduction to NVIDIA's NeMo toolkit for conversational AI, detailing its ASR capabilities, model architectures, training workflow, Chinese language support, deployment options, and additional speech AI features such as VAD and speaker diarization.

ASRChinese SpeechConformer

0 likes · 15 min read

An Overview of NVIDIA NeMo for Speech AI: ASR Training, Chinese Support, and Related Applications

DataFunSummit

Apr 18, 2023 · Artificial Intelligence

Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT

This article presents comprehensive best‑practice guidelines for deploying conversational speech AI—including ASR and TTS pipelines—on GPU servers using NVIDIA Triton Inference Server and TensorRT, covering workflow overview, performance optimizations, streaming inference, and real‑world deployment tips.

ASRConversational AIGPU deployment

0 likes · 14 min read

Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT

DataFunSummit

Dec 3, 2021 · Artificial Intelligence

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

This article presents an in‑depth overview of Alibaba's real‑time voice dialogue system, covering the Hotline XiaoMi robot, the unique challenges of spoken interactions such as colloquialism, multimodality and duplex communication, and the research advances in ASR‑robust SLU, emotion detection, colloquial processing, and duplex conversation modeling.

ASRMultimodalSLU

0 likes · 22 min read

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

58 Tech

May 28, 2019 · Artificial Intelligence

Architecture and Design of an AI‑Powered Voice Robot System

The article describes the design and implementation of a voice robot platform, covering its background, layered architecture, dialogue flow, intent recognition techniques, micro‑service backend, and future improvements, highlighting how AI models and telephony integration enable automated multi‑turn voice interactions for sales and service scenarios.

MicroservicesTelephonydialogue system

0 likes · 11 min read

Architecture and Design of an AI‑Powered Voice Robot System

iQIYI Technical Product Team

Sep 7, 2018 · Artificial Intelligence

iQIYI Technical Salon – AI Technology Practice and Application (Chengdu Session)

On August 25, iQIYI’s Chengdu R&D Center hosted its second Technical Salon, featuring talks on AI-driven content understanding for short‑video feeds, speech synthesis and editing, industry‑standard speech recognition, semantic search ranking, anti‑spam UGC text analysis, and concluding with recruitment invites and a preview of the upcoming Shanghai salon.

AIUGC Text Analysiscontent understanding

0 likes · 6 min read

iQIYI Technical Salon – AI Technology Practice and Application (Chengdu Session)