Artificial Intelligence 12 min read

Speaker Role Recognition in an Intelligent Voice Analysis Platform

This article describes a speaker role recognition system for a voice analysis platform, detailing a gender‑based pre‑filter, keyword‑matching and TextCNN‑based text classification, and single‑sentence correction methods that together improve role assignment accuracy by about 6% over baseline third‑party solutions.

58 Tech

Oct 9, 2020

Speaker Role Recognition in an Intelligent Voice Analysis Platform

Introduction

The 58.com life‑service platform generates massive call recordings, which contain valuable information. An intelligent voice analysis platform converts speech to text and applies natural language processing to improve call‑center service quality and the efficiency of connections between customers and merchants.

Background

Speaker role identification (later referred to as role recognition) determines whether a speaker is an agent or a customer. The platform first separates speakers, transcribes audio, then runs a role‑recognition module before downstream tasks.

Method Overview

The overall strategy first uses a gender‑recognition model to check if the two parties are of opposite sex. If they are, the agent’s gender is looked up in a database and the speaker with matching gender is assigned as the agent; otherwise, a two‑step optimization is performed: an initial role assignment followed by single‑sentence role correction.

Gender Recognition

A VGGish + Bi‑LSTM + Attention model extracts FBank features from audio waveforms, producing a 128‑dimensional embedding that is fed to Bi‑LSTM + Attention to predict speaker gender, achieving 93.16% accuracy. To mitigate errors, the model’s prediction is used only when its confidence exceeds 0.99 and both genders have more than one utterance, resulting in a 19.05% coverage for gender‑based role assignment.

Initial Role Assignment

Agents and customers exhibit distinct lexical patterns: agents use business‑related terminology, while customers tend to give short responses. The initial strategy extracts n‑gram keywords and sentence semantics (via a TextCNN model) to score each speaker’s likelihood of being an agent or a customer, then assigns roles based on the higher score.

Single‑Sentence Role Correction

Because speaker‑separation errors can mix utterances, a correction step examines individual sentences. Two sub‑strategies are applied: n‑gram keyword matching and TextCNN‑based text classification. Sentences with a prediction confidence above 0.998 are reassigned, improving overall sentence‑level role accuracy.

TextCNN Model

Each sentence is embedded and fed to a TextCNN classifier. When the predicted probability for a role exceeds a threshold (0.998), the role is accepted. Under this setting, TextCNN reaches 89.93% accuracy.

n‑Gram Keyword Extraction Strategy

Keywords are extracted using n‑grams and their frequency distribution across agent and customer utterances. Keywords with a frequency ratio greater than 9:1 are retained, yielding an 89.51% accuracy for keyword‑based role assignment.

Summary

By combining gender priors, keyword matching, and TextCNN‑based text classification, the system improves role‑recognition accuracy by 6% compared with the original third‑party solution. Future work includes training separate models for different business lines and exploring pre‑trained models for better text classification.

Department Introduction

58.com TEG Technology Engineering Platform AI Lab focuses on applying AI across the company, building a central AI capability platform to boost business efficiency and user experience. Products include intelligent customer service, voice bots, automated writing, voice analysis, marketing systems, algorithm platforms, and speech recognition.

Author

Yin Zilong – AI Lab algorithm engineer at 58.com TEG Technology Engineering Platform.

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.