Artificial Intelligence 22 min read

Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention

This talk presents recent year’s research on natural language generation, covering the ProphetNet pre‑trained generation model, external‑knowledge integration for generation, non‑autoregressive pre‑training (BANG), the Poolingformer long‑text architecture, EL‑attention for faster decoding, and a new multi‑task generation benchmark.

DataFunSummit

Jul 18, 2022

Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention

Guest speaker Dr. Gong Yeyun from Microsoft Research Asia introduced a series of recent works on natural language generation (NLG) and their practical applications.

1. ProphetNet ("先知网络") : A pre‑trained generation model originally designed for ad‑keyword generation, using a trie‑constrained decoding to guarantee that generated keywords belong to a candidate set. The model is trained with a masked language modeling objective, enabling efficient decoding and strong performance on Chinese and English tasks such as summarization, question generation, and dialogue response.

2. Knowledge‑Enhanced Generation : Incorporates external knowledge into the generation process using the CommonGen dataset. Concepts are enriched with prototype information extracted from in‑domain or out‑of‑domain corpora, filtered by a scaling model and a prototype position indicator, improving generation quality on concept‑to‑sentence tasks.

3. Non‑Autoregressive Pre‑training (BANG) : Proposes a semi‑auto‑regressive approach that bridges fully auto‑regressive and non‑auto‑regressive decoding. By predicting multiple future tokens at each step and using a cross‑visible n‑stream self‑attention mechanism, the model achieves lower latency while maintaining strong performance on tasks such as SQuAD QG, XSum summarization, and PersonaChat response generation.

4. Long‑Text Modeling – Poolingformer : Introduces a two‑level window attention where a primary window performs full self‑attention and a secondary window applies pooling to distant tokens, drastically reducing computation for sequences up to 16k tokens while preserving accuracy on NQ, TyDi‑QA, and arXiv summarization benchmarks.

5. Generation Acceleration – EL‑Attention : Replaces the traditional Q·W_K and V·W_V projections with an efficient EL‑query computation (Q·W_Q·W_T^T), eliminating costly matrix multiplications for K and V during each decoding step. This yields substantial speed‑ups without degrading generation quality across Transformer, BART, and GPT families.

6. Generation Benchmark : Constructs a comprehensive benchmark covering eight datasets and four tasks, providing a leaderboard that demonstrates the proposed models’ superiority over existing baselines.

The presentation concluded with a summary of the research contributions and an invitation for further discussion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

pretraining Knowledge Integration non‑autoregressive Efficient Attention long‑text modeling

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.