Tagged articles
4 articles
Page 1 of 1
Weekly Large Model Application
Weekly Large Model Application
Mar 17, 2026 · Artificial Intelligence

Essential Features Every Voice Interaction System Must Support

The article provides a comprehensive analysis of core voice interaction system capabilities—including barge‑in, turn‑taking, multi‑turn dialogue, intent recognition, speaker identification, streaming latency, noise robustness, multilingual support, emotion handling, personalization, security, and deployment considerations—highlighting typical scenarios such as smart speakers, in‑car assistants, call centers, and meeting transcription.

ASRLatencyMultimodal
0 likes · 11 min read
Essential Features Every Voice Interaction System Must Support
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 1, 2026 · Artificial Intelligence

Microsoft VibeVoice‑ASR Open‑Source: One‑Shot 60‑Minute Transcription with Speaker ID and Timestamps

Microsoft’s newly open‑sourced VibeVoice‑ASR model can transcribe up to 60‑minute audio in a single pass, preserving global context while providing built‑in speaker diarization and timestamps, supports 50+ languages, offers custom hot‑word injection, and can be deployed via Docker, Gradio, or vLLM for high‑throughput API services.

ASRDockerLoRA
0 likes · 9 min read
Microsoft VibeVoice‑ASR Open‑Source: One‑Shot 60‑Minute Transcription with Speaker ID and Timestamps
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 7, 2024 · Artificial Intelligence

Multimodal Speaker Diarization for Long-Form Video Scripts

iQIYI’s multimodal speaker diarization system splits long‑form video using subtitle timestamps and scene detection, extracts voiceprints with a custom model, hierarchically clusters them, and applies an Activate Speaker Detection algorithm combined with face‑recognition to assign speakers, achieving around 90 % precision and recall and boosting downstream tasks such as summarization, translation, and dubbing.

Multimodal AIdialogue recognitioniQIYI
0 likes · 8 min read
Multimodal Speaker Diarization for Long-Form Video Scripts
58 Tech
58 Tech
Aug 7, 2020 · Artificial Intelligence

Technical Overview of 58.com Intelligent Voice Analysis Platform

The article presents a comprehensive technical overview of 58.com’s intelligent voice analysis platform, detailing its business background, system architecture, speech and NLP technologies, speaker diarization methods, model performance, data labeling workflow, and practical applications in call‑center quality inspection and user profiling.

AI Platformdata labelingnatural language processing
0 likes · 11 min read
Technical Overview of 58.com Intelligent Voice Analysis Platform