Comprehensive Solution for Human‑Machine Voice Dialogue Robot at 58.com
This article presents a complete solution for 58.com’s human‑machine voice dialogue robot, detailing its background, overall architecture, intelligent outbound process, core functions such as call service, anti‑spam, status recognition, multi‑turn dialogue management, intent classification, slot extraction, whole‑round intent detection, and various practical application scenarios.
In this article we introduce the complete solution of the human‑machine voice dialogue robot developed by 58.com, focusing on the design and implementation of dialogue strategy management, automatic call dialing, intent recognition, anti‑spam control, and other core modules, and explain how the robot is applied in various business scenarios to improve sales, operations, and customer service efficiency.
Background – 58.com is the largest lifestyle information service platform in China, covering recruitment, automobiles, finance, local services, second‑hand goods, etc. Phone communication is a key channel for many business modules, especially recruitment, which requires repetitive tasks such as confirming job seeker information, scheduling interviews, and follow‑up calls. To reduce manual workload and improve service quality, a voice robot was built to handle phone conversations.
Compared with manual dialing, the robot offers stable emotional tone, lower long‑term cost, and consistent performance. After evaluating market solutions, 58.com decided to develop its own robot to quickly adapt to changing scenarios and solve personalized business problems.
Overall Architecture
1. Access Layer : Provides API interfaces for business systems to invoke the robot; after a call ends, results are returned via WMB for asynchronous processing and feedback.
2. Web Management Layer : Handles script configuration, permission control, batch dialing, anti‑spam policies, and data visualization.
3. Logic Layer : Core control layer that orchestrates the entire dialogue flow.
4. Editing & Operations Layer : Used for data annotation, which feeds model iteration and online evaluation.
5. Infrastructure Layer : Includes SIP phone resources for dialing and third‑party speech recognition/synthesis services (e.g., Alibaba, Tencent).
Intelligent Outbound Process – Divided into pre‑call, in‑call, and post‑call stages. Pre‑call sets up strategies such as anti‑spam logic, SIP selection, and script loading. In‑call streams synthesized speech to the user, captures user responses, performs real‑time speech‑to‑text conversion, and triggers appropriate dialogue actions. Post‑call performs status judgment, whole‑round intent recognition, data storage, and callback via WMB.
Core Functions
1. Call Service : Establishes phone connections using the open‑source JAIN‑SIP library, managing resources to improve connection rates.
2. Anti‑Spam Strategy : Includes whitelist/blacklist, time‑window control, frequency limits, and user emotion detection to avoid over‑calling.
3. Call Status Recognition : Determines whether a number is valid, busy, or disconnected using SIP status codes and ringtone speech classification.
4. Intelligent Dialogue Interaction – Consists of dialogue management, DTMF key capture, single‑sentence intent recognition, standard question matching, and slot extraction.
• Dialogue Management : Converts user speech to text, performs NLU (intent and slot detection), maps to system actions based on a predefined script library, and generates responses.
• Key Capture : Parses DTMF signals from RTP/SIP/SDP protocols.
• Single‑Sentence Intent Recognition : Uses a TextCNN model (19 intent categories) for multi‑class classification; experiments with BERT fine‑tuned on internal data show a ~2% accuracy gain.
• Standard Question Matching : Employs a Bi‑LSTM‑DSSM architecture to match user queries with a curated FAQ database; BERT encodings (max‑pool) are also evaluated.
• Slot Extraction : Implements IDCNN+CRF to extract entities from dialogue text.
5. Whole‑Round Intent Recognition : Aggregates all user utterances in a call and classifies the overall intent (e.g., SUCCESS, CENTRAL, REFUSED) using TextCNN and multi‑model fusion; evaluation includes offline human review and online A/B testing, with speech recognition errors being the main accuracy bottleneck.
Application Scenarios – Beyond sales, the robot is used for notifications, satisfaction surveys, information verification, alerts, sales‑assistant training, and internal operation alerts. Four concrete cases are presented: campus recruitment efficiency, customer service efficiency, operations efficiency, and sales efficiency, each showing how the robot reduces manual effort and improves conversion.
Conclusion – Voice dialogue technology has been successfully deployed across multiple business modules at 58.com, including sales, notifications, and internal alerts. The article systematically describes the robot’s architecture, core capabilities, and practical use cases.
Author : Li Zhong, AI Lab Algorithm Architect, 58.com.
Source : 58 Technology Salon, edited by Zhao Wang, DataFun community.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.