Tagged articles
13 articles
Page 1 of 1
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

Task Alignment: How to Give Your Speech Model a Job Handbook

The article explains how to transform a pretrained speech model into a product‑ready assistant by defining demonstration data, clarifying team debates on persona, safety, and length, contrasting alignment with pretraining, and highlighting common pitfalls to avoid during deployment.

Dialogue SystemsSafetySpeech AI
0 likes · 6 min read
Task Alignment: How to Give Your Speech Model a Job Handbook
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram

The article distinguishes two meanings of “end‑to‑end,” then outlines four sequential stages—defining data and scenario, massive pre‑training on audio‑text pairs, task alignment via instruction or supervised fine‑tuning, and optional preference tuning—to guide engineers in building usable speech assistants.

Speech AIaudio dataend-to-end models
0 likes · 6 min read
What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives

The article clarifies the dual meaning of “end‑to‑end” in speech AI—product simplicity and engineering unification—then outlines six emerging trends, from real‑time conversational latency to multilingual robustness, token‑based audio pipelines, voice‑specific security, edge privacy, and the growing importance of data quality and reproducibility.

Edge ComputingEnd-to-EndSpeech AI
0 likes · 8 min read
Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives
AI Explorer
AI Explorer
Apr 3, 2026 · Artificial Intelligence

VibeVoice: Open‑Source Real‑Time TTS and 60‑Minute ASR from Microsoft

VibeVoice is a Microsoft‑backed open‑source framework that combines streaming text‑to‑speech and ultra‑long audio speech‑to‑text capabilities, offering multilingual models, low‑latency generation, speaker diarization, and easy deployment via Hugging Face, positioning it as a commercial‑grade alternative for developers.

Hugging FaceMicrosoftSpeech AI
0 likes · 7 min read
VibeVoice: Open‑Source Real‑Time TTS and 60‑Minute ASR from Microsoft
DataFunTalk
DataFunTalk
Jun 3, 2024 · Artificial Intelligence

Deploying Speech AI Services Quickly with NVIDIA Riva

This article explains how to use NVIDIA Riva to rapidly deploy speech AI services, covering Riva's overview, Chinese ASR model updates, TTS capabilities, customization options, the Quickstart tool, and a Q&A session that clarifies deployment, model fine‑tuning, and integration with NeMo and Triton.

ASRGPU AccelerationNVIDIA Riva
0 likes · 13 min read
Deploying Speech AI Services Quickly with NVIDIA Riva
DataFunTalk
DataFunTalk
Feb 13, 2024 · Artificial Intelligence

An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training

This article introduces NVIDIA’s open‑source NeMo framework, detailing its PyTorch‑based architecture for Speech AI, ASR and TTS training, NLP and LLM support, GPU‑optimized parallelism, pre‑trained model resources, fine‑tuning techniques, and the accompanying NeMo Aligner and Framework tools.

ASRNVIDIA NeMoPyTorch
0 likes · 18 min read
An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training
DataFunTalk
DataFunTalk
Jan 26, 2024 · Artificial Intelligence

Efficient Deployment of Speech AI Models on GPUs

This article explains how to efficiently deploy speech AI models—including ASR and TTS—on GPUs using NVIDIA's Triton Inference Server and TensorRT, covering background challenges, GPU‑based solutions, decoding optimizations, Whisper acceleration with TensorRT‑LLM, streaming TTS improvements, voice‑cloning pipelines, future plans, and a Q&A session.

ASRGPUInference
0 likes · 20 min read
Efficient Deployment of Speech AI Models on GPUs
DataFunSummit
DataFunSummit
Apr 18, 2023 · Artificial Intelligence

Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT

This article presents comprehensive best‑practice guidelines for deploying conversational speech AI—including ASR and TTS pipelines—on GPU servers using NVIDIA Triton Inference Server and TensorRT, covering workflow overview, performance optimizations, streaming inference, and real‑world deployment tips.

ASRConversational AIGPU deployment
0 likes · 14 min read
Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT
DataFunSummit
DataFunSummit
Dec 3, 2021 · Artificial Intelligence

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

This article presents an in‑depth overview of Alibaba's real‑time voice dialogue system, covering the Hotline XiaoMi robot, the unique challenges of spoken interactions such as colloquialism, multimodality and duplex communication, and the research advances in ASR‑robust SLU, emotion detection, colloquial processing, and duplex conversation modeling.

ASRSLUSpeech AI
0 likes · 22 min read
Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation
58 Tech
58 Tech
May 28, 2019 · Artificial Intelligence

Architecture and Design of an AI‑Powered Voice Robot System

The article describes the design and implementation of a voice robot platform, covering its background, layered architecture, dialogue flow, intent recognition techniques, micro‑service backend, and future improvements, highlighting how AI models and telephony integration enable automated multi‑turn voice interactions for sales and service scenarios.

MicroservicesSpeech AITelephony
0 likes · 11 min read
Architecture and Design of an AI‑Powered Voice Robot System
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 7, 2018 · Artificial Intelligence

iQIYI Technical Salon – AI Technology Practice and Application (Chengdu Session)

On August 25, iQIYI’s Chengdu R&D Center hosted its second Technical Salon, featuring talks on AI-driven content understanding for short‑video feeds, speech synthesis and editing, industry‑standard speech recognition, semantic search ranking, anti‑spam UGC text analysis, and concluding with recruitment invites and a preview of the upcoming Shanghai salon.

AISpeech AIUGC Text Analysis
0 likes · 6 min read
iQIYI Technical Salon – AI Technology Practice and Application (Chengdu Session)