Boosting Task-Oriented Dialogue with Heterogeneous Memory Networks

This paper introduces Heterogeneous Memory Networks (HMNs), combining context‑free and context‑aware memory modules to jointly process user queries, dialogue history, and knowledge bases, achieving state‑of‑the‑art performance on three task‑oriented dialogue datasets in both BLEU and F1 metrics.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Boosting Task-Oriented Dialogue with Heterogeneous Memory Networks

Overview

Human agents convey knowledge through language, while machines that only rely on historical dialogue often generate safe but generic replies lacking the "temperature" of human conversation. To address this, Alibaba engineers propose Heterogeneous Memory Networks (HMNs) that simultaneously handle user utterances, dialogue history, and background knowledge bases, improving dialogue quality.

Problem Background

In task‑oriented dialogue, answering a user query requires understanding the utterance, retrieving relevant facts from databases or knowledge bases, and composing a response. Models that only learn from dialogue corpora tend to produce safe, generic answers and fail to incorporate essential knowledge.

Proposed Solution

HMNs consist of a context‑free memory network and a context‑aware memory network. The context‑aware component encodes and stores structured knowledge tuples, while the context‑free component stores serialized user utterances and dialogue history. Two small vocabularies (knowledge‑word and history‑word) and a large vocabulary are generated for word selection during response generation.

Model Architecture

1. Encoder

The encoder transforms dialogue history and user queries into a context vector. Each token is represented by its word embedding combined with turn‑level and speaker information (e.g., (hello, t1, user)). The sequence of embeddings is fed into a context‑aware memory network to produce the context vector.

2. Context‑Aware Memory Network

Traditional memory networks lose sequential information. HMNs modify the end‑to‑end memory network by sharing weights between adjacent hops and adding a gating mechanism (inspired by bidirectional GRU) to preserve contextual dependencies. Queries are combined with hop outputs to produce refined representations.

3. Context‑Free Memory Network

This component mirrors the standard end‑to‑end memory network and stores structured knowledge tuples. The output of the context‑aware network serves as a query for the context‑free network, mimicking how humans first determine a response strategy and then retrieve missing facts.

4. Decoder

The decoder integrates HMNs with an RNN controller. At each decoding step, the controller queries the context‑aware memory to obtain a history‑word distribution and a global word distribution, then queries the context‑free memory to obtain a knowledge‑word distribution. A copy mechanism selects a word from the three distributions.

5. Copy Mechanism and Word Selection

The probability of copying a word from memory is derived from attention weights. If a word is not present in the memory, a special jump token is used. The final word is chosen based on the highest probability among the three vocabularies, with fallback rules for jump tokens.

Experiments and Results

Datasets

Three popular task‑oriented dialogue datasets are used: Key‑Value Retrieval, DSTC2, and dialog‑bAbI tasks. All datasets contain dialogues and associated knowledge bases (or ontology files for DSTC2).

Evaluation Metrics

BLEU – measures fluency of generated responses.

F1 – assesses accuracy of extracted knowledge facts.

Per‑response and per‑dialog accuracy – used on the bAbI dataset to evaluate exact match generation.

Baselines

SEQ2SEQ (LSTM)

SEQ2SEQ+Attention

Mem2Seq – incorporates multihop attention.

HMNs‑CFO – HMNs with only context‑free memory (no context‑aware component).

Results

HMNs achieve the best scores on most metrics across all three datasets, surpassing existing SOTA models. Example generation shows fluent sentences with correctly extracted knowledge.

Analysis

HMNs consistently outperform baselines, especially in F1, indicating effective knowledge extraction.

The context‑aware memory accelerates learning and yields higher accuracy than the context‑free variant.

Separating dialogue history and knowledge into distinct memories (HMNs‑CFO vs. Mem2Seq) proves beneficial.

Future Work

Performance on the weather‑prediction subset of the Key‑Value Retrieval dataset is lower due to the large number of knowledge tuples. Reducing candidate tuples via matching improves F1, suggesting that scaling memory networks to massive knowledge bases requires preprocessing or more efficient retrieval strategies.

References

[1] Madotto, A., Wu, C.-S., & Fung, P. "Mem2Seq: Effectively incorporating knowledge bases into end‑to‑end task‑oriented dialog systems." arXiv preprint arXiv:1804.08217 (2018).

[2] Eric, M., & Manning, C. D. "Key‑value retrieval networks for task‑oriented dialogue." arXiv preprint arXiv:1705.05414 (2017).

[3] DSTC2 Handbook. http://camdial.org/~mh521/dstc/downloads/handbook.pdf

[4] Bordes, A., Boureau, Y.-L., & Weston, J. "Learning end‑to‑end goal‑oriented dialog." arXiv preprint arXiv:1605.07683 (2016).

[5] Sukhbaatar, S., Weston, J., & Fergus, R. "End‑to‑end memory networks." NIPS (2015).

[6] Cho, K., et al. "On the properties of neural machine translation: Encoder‑decoder approaches." arXiv preprint arXiv:1409.1259 (2014).

[7] Vinyals, O., & Le, Q. "A neural conversational model." arXiv preprint arXiv:1506.05869 (2015).

[8] Shang, L., Lu, Z., & Li, H. "Neural responding machine for short‑text conversation." arXiv preprint arXiv:1503.02364 (2015).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

natural language processingtask-oriented dialogueDialogue Systemsmemory networksknowledge integration
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.