How HuoLala Leverages AI to Revolutionize Service Quality Inspection
This article details HuoLala's AI‑driven intelligent quality inspection system, covering its NLP‑based semantic understanding pipeline, data denoising, confidence learning, contrastive learning, model acceleration techniques such as pruning, knowledge distillation, quantization, and interpretability methods to improve coverage, recall and risk detection.
Background
With the rapid growth of HuoLala's user base and business volume, massive service data—including call‑center voice recordings, ticket texts, and other channel information—has been accumulated. Determining service personnel performance, compliance with standards, hidden business opportunities, public sentiment, and risk information from this data is challenging. Traditional manual quality inspection is labor‑intensive, low‑coverage, and often misses issues.
Solution
To address these challenges, an intelligent quality inspection system was built using NLP semantic understanding and ASR speech recognition. The workflow (Figure 1) achieves 100% multi‑channel coverage and precise risk identification. Hotline recordings are first transcribed via ASR, then both text and transcriptions are processed uniformly by NLP. Business‑specific rules are defined, and a semantic‑understanding robot scores dialogues, aggregates detailed results, and finally humans review the outcomes.
Semantic Understanding Core
The core consists of text classification and entity recognition using pretrained language models. After semantic parsing (Figure 2), each message’s intent and entities are identified, then combined with predefined rules to infer session‑level events, categorized as public‑opinion events (e.g., over‑charging, lost goods, abusive agents) or safety events (e.g., traffic accidents, robbery).
Algorithm Exploration and Practice
Problems and Challenges
Training data contains noisy labels, making cleaning costly.
How to obtain better semantic representations from pretrained models?
Large pretrained models have slow inference, hindering online deployment.
Deep models are black boxes; how to explain their predictions?
Data Denoising
Noise in datasets limits algorithm performance. Two denoising strategies were explored: confusion‑matrix‑based filtering and confidence‑learning‑based filtering, combined via k‑fold cross‑validation (Figure 4).
Confidence Learning Denoising
Confidence learning considers label noise and class imbalance, offering more robust denoising. The process (Figure 6) predicts labels, computes class‑wise probability thresholds, and flags samples whose maximum class probability falls below the threshold as noise.
Contrastive Learning
High‑frequency tokens dominate BERT embeddings, harming semantic discrimination. Contrastive learning (Figure 8) pulls together positive pairs and pushes apart negatives, with various unsupervised (e.g., ConSERT, embedding‑level perturbations, SimCSE) and supervised strategies for constructing sample pairs.
Model Acceleration
Transformer‑based models suffer from slow inference. Acceleration techniques include ONNX/TensorRT inference, model pruning, knowledge distillation, and quantization.
Model Pruning
Most parameters are redundant; pruning removes low‑importance neurons or connections. Fine‑grained pruning targets individual weights, while coarse‑grained pruning removes entire modules, channels, or vocabularies (Figure 12).
Knowledge Distillation
Distillation transfers knowledge from a large teacher model to a smaller student model (e.g., DistilBERT, TinyBERT). Knowledge can be distilled at input, feature, or output layers, as illustrated in Figure 14.
Model Quantization
Quantization converts FP32 computations to lower‑precision formats (e.g., FP16) for selected operators, reducing memory and compute while preserving accuracy.
Interpretability
Deep models are black boxes; interpretability helps trust and optimize them. Two levels are discussed:
Instance‑level explanation : identifies training samples that support or oppose a prediction (Figure 16).
Feature‑level explanation : evaluates token importance via gradient‑based methods or integrated gradients (Figure 17).
Conclusion and Outlook
The paper presents the exploration and application of semantic understanding technologies in HuoLala's public‑opinion business, addressing challenges with data denoising, contrastive learning, model acceleration, and interpretability. Future work will extend these techniques to more scenarios to achieve cost reduction and efficiency gains.
References
Vaswani A, et al. Attention is all you need. NeurIPS 2017.
Devlin J, et al. BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018.
Liu Y, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019.
Sun Y, et al. ERNIE: Enhanced Representation through Knowledge Integration. arXiv 2019.
Sun Y, et al. ERNIE 3.0: Large‑scale Knowledge‑Enhanced Pre‑training. arXiv 2021.
Northcutt CG, et al. Confident Learning: Estimating Uncertainty in Dataset Labels. 2021.
Li B, et al. On the Sentence Embeddings from Pre‑trained Language Models. 2020.
Yan Y, et al. ConSERT: A Contrastive Framework for Self‑Supervised Sentence Representation Transfer. 2021.
Gao T, et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings. 2021.
Li H, et al. Pruning Filters for Efficient ConvNets. arXiv 2016.
Michel P, et al. Are Sixteen Heads Really Better Than One? NeurIPS 2019.
Yang Z, et al. TextPruner: A Model Pruning Toolkit for Pre‑Trained Language Models. arXiv 2022.
Hinton G, et al. Distilling the Knowledge in a Neural Network. arXiv 2015.
Sanh V, et al. DistilBERT, a distilled version of BERT. arXiv 2019.
Jiao X, et al. TinyBERT: Distilling BERT for Natural Language Understanding. arXiv 2019.
Yeh CK, et al. Representer Point Selection for Explaining Deep Neural Networks. NeurIPS 2018.
Simonyan K, et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2013.
Baehrens D, et al. How to Explain Individual Classification Decisions. arXiv 2009.
Sundararajan M, et al. Axiomatic Attribution for Deep Networks. ICML 2017.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
