Artificial Intelligence 16 min read

Improving Text Representation and Clustering for Small‑Sample Scenarios in 58.com Used‑Car Customer Service with a Bi‑LSTM Pre‑trained Language Model and Deep Clustering

This article presents a study on enhancing text representation and clustering purity in the small‑sample 58.com used‑car customer‑service scenario by introducing a Bi‑LSTM based pre‑trained language model and an improved Deep Embedded Clustering (DEC) algorithm, demonstrating significant gains in accuracy, silhouette score, and answer‑rate.

58 Tech
58 Tech
58 Tech
Improving Text Representation and Clustering for Small‑Sample Scenarios in 58.com Used‑Car Customer Service with a Bi‑LSTM Pre‑trained Language Model and Deep Clustering

Background – 58.com’s intelligent customer‑service system ("Bangbang") has been deployed across various business lines since 2017. In 2019 it was extended to C‑end users and B‑end merchants, forming the "Bangbang Merchant" version. The system saves hundreds of staff and improves efficiency, but the small‑sample nature of the used‑car domain leads to weak text representation and low clustering purity.

Problem Statement – Two key challenges were identified: (1) how to obtain robust representations for queries in a small‑sample setting to capture diverse phrasings of the same intent, and (2) how to discover new user questions to improve coverage of the automated QA robot.

Bi‑LSTM Pre‑trained Language Model – Inspired by BERT’s masked‑LM task, a Bi‑LSTM encoder was trained on 40 million unlabeled sentences from the used‑car domain, retaining only the masked‑LM objective. To reduce computational cost, Bi‑LSTM replaced the Transformer, and residual‑add‑norm blocks were added. The model was trained for 300 k iterations on a single NVIDIA TESLA P40 GPU (≈28 h). Evaluation on a 26 k‑sample classification task showed accuracy improvement from 0.81 to 0.86, outperforming a generic BERT Chinese model (0.8487 → 0.8662).

Model

Bi‑LSTM

BERT

+Pretrain model Acc

0.8662

0.8487 (5 epoch) / 0.8530 (10 epoch)

No Pretrain Model Acc

0.8107

0.7884 (5 epoch) / 0.8342 (10 epoch)

DEC Algorithm Description – DEC jointly learns feature representations and cluster assignments. It consists of two stages: (1) pre‑training an auto‑encoder to obtain initial embeddings, and (2) fine‑tuning the encoder together with soft cluster assignments by minimizing KL‑divergence between the learned distribution q and a target distribution p. The original DEC uses K‑means for initializing centroids.

Improvements to DEC – To better suit the small‑sample scenario, the authors replaced K‑means centroids with custom centroids computed as the mean vectors of all known expanded question variants for each standard question. This guides the clustering toward the manually curated distribution.

Experiments

Three experiments were conducted on a manually labeled small‑sample dataset:

K‑means + Word2Vec static representation

K‑means + Bi‑LSTM static representation

DEC + Bi‑LSTM pre‑trained model (dynamic representation)

accuracy

silhouette

runtime

K‑means + Word2Vec

0.354

0.047

<5 min

K‑means + Bi‑LSTM

0.377

0.025

<5 min

DEC + Bi‑LSTM

0.8437

0.142

30 min

The DEC‑based approach achieved a much higher accuracy (0.8437) and silhouette score (0.142) than the K‑means baselines, albeit with longer runtime. Online evaluation showed the weekly answer‑rate improved from 79.71 % to 83.62 % after iteration.

Metric

Before Iteration

After Iteration

Answer‑rate

79.71 %

83.62 %

For the expanded‑question discovery task, the improved DEC increased precision from 98.11 % to 98.24 % and recall from 89.66 % to 92.27 %.

Precision

Recall

Before

98.11 %

89.66 %

After

98.24 %

92.27 %

Conclusion & Outlook – By adapting the pre‑training task to the vertical domain and customizing DEC centroids, the authors achieved notable improvements in text representation, clustering purity, and downstream QA performance. Future work includes exploring transfer learning between online/offline data, designing more suitable representation networks, and incorporating self‑supervised tasks.

References

Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. ICML.

Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. Mining Text Data.

Aljalbout, E., et al. (2018). Clustering with deep learning: Taxonomy and new methods. arXiv:1801.07648.

Devlin, J., et al. (2018). BERT: Pre‑training of deep bidirectional transformers for language understanding. arXiv:1810.04805.

Vaswani, A., et al. (2017). Attention is all you need. NIPS.

Machine Learningcustomer servicetext representationBi-LSTMDECdeep clusteringsmall sample
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.