Span-Level Dialogue Summarization via Distant Supervision and Machine Reading Comprehension (DSMRC‑S)

The paper reviews classic summarization models, then proposes DSMRC‑S, a span-level extractive dialogue summarization method using distant supervision and a machine‑reading‑comprehension framework, with token‑level labeling and density‑based span selection, achieving state‑of‑the‑art BLEU and ROUGE improvements on a large Meituan dialogue dataset.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Span-Level Dialogue Summarization via Distant Supervision and Machine Reading Comprehension (DSMRC‑S)

1. Dialogue Summarization Background

With the rapid growth of textual data on the Internet, information overload has become a serious problem. Summarization is an essential technique to reduce the dimensionality of text. Dialogue summarization, a special case of text summarization, focuses on extracting key information from multi‑turn conversations such as customer‑service calls, meetings, or chats.

2. Classic Text and Dialogue Summarization Models

2.1 Extractive Models

Lead‑3 : selects the first three sentences of a document as the summary.

TextRank : builds a graph of sentences and applies a PageRank‑like algorithm to rank them.

Clustering : encodes sentences (e.g., Skip‑Thought, Paragram) and clusters them with K‑means or Mean‑Shift, then picks the centroid‑closest sentences.

Neural Extractive Models : include sequence‑labeling approaches (e.g., SummaRuNNer) and sentence‑ranking approaches (e.g., NeuSUM). SummaRuNNer uses a Bi‑GRU to obtain word‑ and sentence‑level representations and predicts a binary label for each sentence. NeuSUM adds a sentence‑benefit score and a unidirectional GRU to model inter‑sentence dependencies.

2.2 Generative Models

Seq2Seq models have become the backbone of abstractive summarization. Recent work augments Seq2Seq with pre‑trained language models such as BERT.

Pointer‑Generator : combines attention‑based generation with a copy mechanism and coverage loss to alleviate OOV and repetition problems.

Leader‑Writer : a hierarchical Transformer where the Leader predicts a sequence of key points (e.g., background, conclusion) and the Writer generates the final summary conditioned on those points.

3. Span‑Level Extractive Summarization (DSMRC‑S)

3.1 Motivation

In Meituan’s customer‑service scenario, agents manually write call summaries, which is time‑consuming. Existing models either suffer from instability (generative) or require sentence‑level labels (extractive). DSMRC‑S converts dialogue summarization into a machine‑reading‑comprehension (MRC) task, requiring no extra annotation.

3.2 Method

Two‑stage pipeline:

**Stage 1 – Token‑level supervision**: Each token in the dialogue is automatically labeled 1 if it appears in the human‑written summary, otherwise 0. A BERT‑based encoder predicts the probability of each token being part of the answer. The loss is a binary cross‑entropy over the token predictions.

**Stage 2 – Density‑based span selection**: For any candidate span, density = (sum of token probabilities) / (span length). The algorithm enumerates all spans that do not cross speaker boundaries and selects the one with the highest density as the answer for each predefined question (e.g., user background, user request, solution).

3.3 Experiments

Dataset: 400 k real Meituan dialogues, each annotated with four key elements (background, request, solution, etc.).

Metrics: BLEU, ROUGE‑L (F1), and Distinct‑1.

Results: DSMRC‑S outperforms all baselines (including S2S+Att, Pointer‑Generator, Leader‑Writer, TDS‑SATM) by ~3 % on BLEU and ROUGE‑L and by 3.9 % on Distinct‑1. It also shows consistent gains across different key elements, dialogue turns, and summary lengths.

4. Conclusion and Future Directions

The paper first reviews classic extractive and generative summarization methods, then introduces a distance‑supervised, span‑level extractive approach that achieves state‑of‑the‑art performance on real‑world dialogue data. Future work includes multi‑span answer extraction, prompt‑based generative dialogue summarization, and deeper modeling of dialogue structure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

natural language processingBERTDialogue Summarizationmachine reading comprehensionspan-level extraction
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.