Artificial Intelligence 16 min read

Intelligent Customer Service at Meituan: Dialogue Framework, Intent Understanding, Knowledge Discovery, and Emotion Recognition

Meituan's intelligent customer service leverages a dialogue interaction framework with machine‑learning‑driven intent mining, understanding, emotion detection, dialogue management, knowledge discovery, and multi‑task learning to handle both simple QA and complex multi‑turn tasks across its diverse service ecosystem.

DataFunTalk
DataFunTalk
DataFunTalk
Intelligent Customer Service at Meituan: Dialogue Framework, Intent Understanding, Knowledge Discovery, and Emotion Recognition

Intelligent customer service is an AI system that interacts with users in natural language, analyzing intent to provide personalized assistance.

1. Dialogue Framework

When a user enters the service interface, the problem is first understood and then routed to the appropriate backend service. The framework consists of two main parts: offline training and knowledge‑base construction, and online real‑time processing.

The online pipeline extracts basic features such as tokenization, semantic tags, sentiment analysis, and NER, then proceeds to intent understanding (domain classification, intent detection, and slot extraction). After intent is identified, dialogue management handles state tracking (DST) and decision making, finally invoking business service APIs.

2. Intent Understanding

Meituan serves many business scenarios. Users may enter a specific service window (clear domain) or the general portal (ambiguous domain). The system first performs domain classification, then intent classification, handling both single‑turn QA and multi‑turn task‑oriented dialogs.

Example of single‑turn QA:

U: Meituan delivery time

S: Delivery follows the merchant's business hours. If the merchant is open, ordering and delivery are available.

Example of multi‑turn task:

How to become a Meituan merchant?

Task‑oriented dialogs collect required slots and invoke corresponding APIs.

2.1 Domain Classification

Large volumes of business data are collected for training. Challenges include noisy labels (cross‑domain queries) and overlapping domains. An active‑learning loop is used: data collection → labeling → model training → prediction → human correction → retraining.

Experiments show TextCNN runs within 10 ms for a ~15‑character query, while BERT takes ~70 ms; TextCNN is used in production.

2.2 Intent Classification

Two categories: QA‑type intent (handled by retrieval and similarity ranking) and task‑type intent (handled by rule‑based grammars and machine‑learning models). Multi‑task learning jointly models intent classification and slot filling using a Bi‑LSTM with a CRF layer.

Example: "Help me find a suitable Sichuan restaurant for 10 people tomorrow noon" → intent = reservation, slots = time, party size, cuisine.

2.3 Dialogue State Tracking (DST)

DST uses the session context and NLU outputs (potentially multiple intents) together with auxiliary information (order, portal entry) to disambiguate the current domain and intent. If the domain is unclear, the system asks a clarification question.

Example: "Beef soup spilled" → domain = delivery, intent = "apply for food‑damage compensation", then follow the compensation workflow.

3. Knowledge Discovery

The system combines supervised NLU with unsupervised clustering to extract new knowledge from logs. Human operators review and approve the generated knowledge.

3.1 Human‑in‑the‑Loop

When AI cannot resolve an issue, the conversation is handed over to a human agent. Unsupervised clustering (K‑means) groups similar queries, and the resulting clusters are used to build task trees.

Task trees (e.g., "How to apply for food‑damage compensation") define required slots such as delivery method, damage reason, and application status, which are collected via dialog or backend APIs.

4. Emotion Recognition

Voice data from the customer hotline is processed to extract features (FFT, Mel‑filterbanks) and fed into models (LR, SVM, CNN, LSTM, VGGish with attention) to predict emotional states. Weak‑label learning addresses the problem that a single label may not reflect moment‑by‑moment emotion changes.

Model performance ranking: MFCC+LSTM < MFCC+CNN < VGGish+feature‑level attention < VGGish+decision‑level attention.

5. Outlook

Multi‑turn context modeling and intent recommendation.

Multimodal voice‑text fusion, weak‑label learning, and emotion risk detection.

Topic extraction from dialog history, script recommendation, and agent assistance.

The above components aim to optimize both the user‑side (ToC) and the agent‑side (ToB) of Meituan's intelligent customer service loop.

AIcustomer serviceEmotion RecognitionDialogue ManagementIntent UnderstandingKnowledge Discovery
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.