Intelligent Voice Quality Inspection System Architecture and Implementation at 58.com
The article details the design and deployment of an AI-powered intelligent voice quality inspection system at 58.com, covering its overall architecture, speech recognition, role identification, tag detection, rechecking platform, and backend infrastructure, and demonstrates its impact on call‑center efficiency and service quality.
In the 58.com life‑service platform, voice communication is crucial; an intelligent voice quality inspection system integrates real‑time call recordings, converts speech to text, and applies natural language processing (NLP) to improve call‑center agent service quality.
Traditional manual quality inspection relies on human auditors listening to a small fraction of recordings, which is inefficient and low‑coverage. With advances in speech recognition and NLP, an AI‑driven system can automatically evaluate every utterance.
The system’s overall architecture consists of a foundational layer (ASR and NLP modules), a data layer (Kafka, proprietary message bus, storage), a logic layer (role identification, semantic tagging, scoring), an editing/operation layer (web‑based annotation and analysis), and a web‑management layer for human re‑inspection.
Speech recognition (ASR) converts dual‑channel recordings to separate speaker streams; for mono recordings, a voice activity detection and diarization process separates speaker segments before transcription, with performance measured by Diarization Error Rate (DER).
Role identification first uses a gender‑recognition model (VGGish + Bi‑LSTM + attention) to match speaker gender with known agent data; if inconclusive, a generic model applies TextCNN for semantic scoring and a Transformer for correcting ambiguous cases, achieving about 85% accuracy.
Tag recognition employs TextCNN, Transformer, and BERT models, supplemented by rule‑based logic, to detect compliance‑related tags such as “over‑promise” or “agent abuse.” A domain‑adapted BERT, pre‑trained on massive ASR‑derived transcripts, yields the best results, reaching 90% accuracy for sales and 87% for customer‑service inspections.
The re‑inspection subsystem displays results on a web platform, allowing auditors to jump to tagged utterances, listen to audio snippets, and manually add or correct tags, boosting human efficiency by two‑ to three‑fold compared with pure manual review.
Backend services run on 58’s self‑developed RPC framework SCF, using WMonitor for observability and a mix of storage solutions (WOS, Redis, WTable, WCS, MySQL). Micro‑services include data ingestion, core processing, ASR, speaker‑identification, and tag analysis, with asynchronous callbacks for long‑running ASR tasks and online model inference via WPAI.
Deployed at scale, the system processes tens of thousands of calls daily across 13 business lines, saving nearly a thousand person‑hours and markedly improving call‑center efficiency and service quality, while also being applicable to broader C2B voice analytics scenarios.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.