How HuoLala Leverages AI to Revolutionize Service Quality Inspection

This article details HuoLala's AI‑driven intelligent quality inspection system, covering its NLP‑based semantic understanding pipeline, data denoising, confidence learning, contrastive learning, model acceleration techniques such as pruning, knowledge distillation, quantization, and interpretability methods to improve coverage, recall and risk detection.

Huolala Tech
Huolala Tech
Huolala Tech
How HuoLala Leverages AI to Revolutionize Service Quality Inspection

Background

With the rapid growth of HuoLala's user base and business volume, massive service data—including call‑center voice recordings, ticket texts, and other channel information—has been accumulated. Determining service personnel performance, compliance with standards, hidden business opportunities, public sentiment, and risk information from this data is challenging. Traditional manual quality inspection is labor‑intensive, low‑coverage, and often misses issues.

Solution

To address these challenges, an intelligent quality inspection system was built using NLP semantic understanding and ASR speech recognition. The workflow (Figure 1) achieves 100% multi‑channel coverage and precise risk identification. Hotline recordings are first transcribed via ASR, then both text and transcriptions are processed uniformly by NLP. Business‑specific rules are defined, and a semantic‑understanding robot scores dialogues, aggregates detailed results, and finally humans review the outcomes.

Semantic Understanding Core

The core consists of text classification and entity recognition using pretrained language models. After semantic parsing (Figure 2), each message’s intent and entities are identified, then combined with predefined rules to infer session‑level events, categorized as public‑opinion events (e.g., over‑charging, lost goods, abusive agents) or safety events (e.g., traffic accidents, robbery).

Algorithm Exploration and Practice

Problems and Challenges

Training data contains noisy labels, making cleaning costly.

How to obtain better semantic representations from pretrained models?

Large pretrained models have slow inference, hindering online deployment.

Deep models are black boxes; how to explain their predictions?

Data Denoising

Noise in datasets limits algorithm performance. Two denoising strategies were explored: confusion‑matrix‑based filtering and confidence‑learning‑based filtering, combined via k‑fold cross‑validation (Figure 4).

Confidence Learning Denoising

Confidence learning considers label noise and class imbalance, offering more robust denoising. The process (Figure 6) predicts labels, computes class‑wise probability thresholds, and flags samples whose maximum class probability falls below the threshold as noise.

Contrastive Learning

High‑frequency tokens dominate BERT embeddings, harming semantic discrimination. Contrastive learning (Figure 8) pulls together positive pairs and pushes apart negatives, with various unsupervised (e.g., ConSERT, embedding‑level perturbations, SimCSE) and supervised strategies for constructing sample pairs.

Model Acceleration

Transformer‑based models suffer from slow inference. Acceleration techniques include ONNX/TensorRT inference, model pruning, knowledge distillation, and quantization.

Model Pruning

Most parameters are redundant; pruning removes low‑importance neurons or connections. Fine‑grained pruning targets individual weights, while coarse‑grained pruning removes entire modules, channels, or vocabularies (Figure 12).

Knowledge Distillation

Distillation transfers knowledge from a large teacher model to a smaller student model (e.g., DistilBERT, TinyBERT). Knowledge can be distilled at input, feature, or output layers, as illustrated in Figure 14.

Model Quantization

Quantization converts FP32 computations to lower‑precision formats (e.g., FP16) for selected operators, reducing memory and compute while preserving accuracy.

Interpretability

Deep models are black boxes; interpretability helps trust and optimize them. Two levels are discussed:

Instance‑level explanation : identifies training samples that support or oppose a prediction (Figure 16).

Feature‑level explanation : evaluates token importance via gradient‑based methods or integrated gradients (Figure 17).

Conclusion and Outlook

The paper presents the exploration and application of semantic understanding technologies in HuoLala's public‑opinion business, addressing challenges with data denoising, contrastive learning, model acceleration, and interpretability. Future work will extend these techniques to more scenarios to achieve cost reduction and efficiency gains.

References

Vaswani A, et al. Attention is all you need. NeurIPS 2017.

Devlin J, et al. BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018.

Liu Y, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019.

Sun Y, et al. ERNIE: Enhanced Representation through Knowledge Integration. arXiv 2019.

Sun Y, et al. ERNIE 3.0: Large‑scale Knowledge‑Enhanced Pre‑training. arXiv 2021.

Northcutt CG, et al. Confident Learning: Estimating Uncertainty in Dataset Labels. 2021.

Li B, et al. On the Sentence Embeddings from Pre‑trained Language Models. 2020.

Yan Y, et al. ConSERT: A Contrastive Framework for Self‑Supervised Sentence Representation Transfer. 2021.

Gao T, et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings. 2021.

Li H, et al. Pruning Filters for Efficient ConvNets. arXiv 2016.

Michel P, et al. Are Sixteen Heads Really Better Than One? NeurIPS 2019.

Yang Z, et al. TextPruner: A Model Pruning Toolkit for Pre‑Trained Language Models. arXiv 2022.

Hinton G, et al. Distilling the Knowledge in a Neural Network. arXiv 2015.

Sanh V, et al. DistilBERT, a distilled version of BERT. arXiv 2019.

Jiao X, et al. TinyBERT: Distilling BERT for Natural Language Understanding. arXiv 2019.

Yeh CK, et al. Representer Point Selection for Explaining Deep Neural Networks. NeurIPS 2018.

Simonyan K, et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2013.

Baehrens D, et al. How to Explain Individual Classification Decisions. arXiv 2009.

Sundararajan M, et al. Axiomatic Attribution for Deep Networks. ICML 2017.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

contrastive learningNLPmodel accelerationsemantic understandingdata denoising
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.