Speaker Role Recognition in an Intelligent Voice Analysis Platform
This article describes a speaker role recognition system for a voice analysis platform, detailing a gender‑based pre‑filter, keyword‑matching and TextCNN‑based text classification, and single‑sentence correction methods that together improve role assignment accuracy by about 6% over baseline third‑party solutions.
Introduction
The 58.com life‑service platform generates massive call recordings, which contain valuable information. An intelligent voice analysis platform converts speech to text and applies natural language processing to improve call‑center service quality and the efficiency of connections between customers and merchants.
Background
Speaker role identification (later referred to as role recognition) determines whether a speaker is an agent or a customer. The platform first separates speakers, transcribes audio, then runs a role‑recognition module before downstream tasks.
Method Overview
The overall strategy first uses a gender‑recognition model to check if the two parties are of opposite sex. If they are, the agent’s gender is looked up in a database and the speaker with matching gender is assigned as the agent; otherwise, a two‑step optimization is performed: an initial role assignment followed by single‑sentence role correction.
Gender Recognition
A VGGish + Bi‑LSTM + Attention model extracts FBank features from audio waveforms, producing a 128‑dimensional embedding that is fed to Bi‑LSTM + Attention to predict speaker gender, achieving 93.16% accuracy. To mitigate errors, the model’s prediction is used only when its confidence exceeds 0.99 and both genders have more than one utterance, resulting in a 19.05% coverage for gender‑based role assignment.
Initial Role Assignment
Agents and customers exhibit distinct lexical patterns: agents use business‑related terminology, while customers tend to give short responses. The initial strategy extracts n‑gram keywords and sentence semantics (via a TextCNN model) to score each speaker’s likelihood of being an agent or a customer, then assigns roles based on the higher score.
Single‑Sentence Role Correction
Because speaker‑separation errors can mix utterances, a correction step examines individual sentences. Two sub‑strategies are applied: n‑gram keyword matching and TextCNN‑based text classification. Sentences with a prediction confidence above 0.998 are reassigned, improving overall sentence‑level role accuracy.
TextCNN Model
Each sentence is embedded and fed to a TextCNN classifier. When the predicted probability for a role exceeds a threshold (0.998), the role is accepted. Under this setting, TextCNN reaches 89.93% accuracy.
n‑Gram Keyword Extraction Strategy
Keywords are extracted using n‑grams and their frequency distribution across agent and customer utterances. Keywords with a frequency ratio greater than 9:1 are retained, yielding an 89.51% accuracy for keyword‑based role assignment.
Summary
By combining gender priors, keyword matching, and TextCNN‑based text classification, the system improves role‑recognition accuracy by 6% compared with the original third‑party solution. Future work includes training separate models for different business lines and exploring pre‑trained models for better text classification.
Department Introduction
58.com TEG Technology Engineering Platform AI Lab focuses on applying AI across the company, building a central AI capability platform to boost business efficiency and user experience. Products include intelligent customer service, voice bots, automated writing, voice analysis, marketing systems, algorithm platforms, and speech recognition.
Author
Yin Zilong – AI Lab algorithm engineer at 58.com TEG Technology Engineering Platform.
Recommended Reading
Security Front‑End Integration Practice
Open‑Source WPaxos: Production‑Grade Paxos Implementation
58 Tech Salon – Distributed Storage Series
Distributed Storage Live Review
58 AI Algorithm Competition Award Ceremony
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.