Artificial Intelligence 12 min read

Intelligent Voice Quality Inspection System Architecture and Implementation at 58.com

The article details the design and deployment of an AI-powered intelligent voice quality inspection system at 58.com, covering its overall architecture, speech recognition, role identification, tag detection, rechecking platform, and backend infrastructure, and demonstrates its impact on call‑center efficiency and service quality.

58 Tech

Aug 3, 2020

Intelligent Voice Quality Inspection System Architecture and Implementation at 58.com

In the 58.com life‑service platform, voice communication is crucial; an intelligent voice quality inspection system integrates real‑time call recordings, converts speech to text, and applies natural language processing (NLP) to improve call‑center agent service quality.

Traditional manual quality inspection relies on human auditors listening to a small fraction of recordings, which is inefficient and low‑coverage. With advances in speech recognition and NLP, an AI‑driven system can automatically evaluate every utterance.

The system’s overall architecture consists of a foundational layer (ASR and NLP modules), a data layer (Kafka, proprietary message bus, storage), a logic layer (role identification, semantic tagging, scoring), an editing/operation layer (web‑based annotation and analysis), and a web‑management layer for human re‑inspection.

Speech recognition (ASR) converts dual‑channel recordings to separate speaker streams; for mono recordings, a voice activity detection and diarization process separates speaker segments before transcription, with performance measured by Diarization Error Rate (DER).

Role identification first uses a gender‑recognition model (VGGish + Bi‑LSTM + attention) to match speaker gender with known agent data; if inconclusive, a generic model applies TextCNN for semantic scoring and a Transformer for correcting ambiguous cases, achieving about 85% accuracy.

Tag recognition employs TextCNN, Transformer, and BERT models, supplemented by rule‑based logic, to detect compliance‑related tags such as “over‑promise” or “agent abuse.” A domain‑adapted BERT, pre‑trained on massive ASR‑derived transcripts, yields the best results, reaching 90% accuracy for sales and 87% for customer‑service inspections.

The re‑inspection subsystem displays results on a web platform, allowing auditors to jump to tagged utterances, listen to audio snippets, and manually add or correct tags, boosting human efficiency by two‑ to three‑fold compared with pure manual review.

Backend services run on 58’s self‑developed RPC framework SCF, using WMonitor for observability and a mix of storage solutions (WOS, Redis, WTable, WCS, MySQL). Micro‑services include data ingestion, core processing, ASR, speaker‑identification, and tag analysis, with asynchronous callbacks for long‑running ASR tasks and online model inference via WPAI.

Deployed at scale, the system processes tens of thousands of calls daily across 13 business lines, saving nearly a thousand person‑hours and markedly improving call‑center efficiency and service quality, while also being applicable to broader C2B voice analytics scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Architecture AI deep learning natural language processing speech recognition voice quality inspection

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.