DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation and Zero‑Shot Transfer
DeltaLM is a new multilingual pretrained encoder‑decoder model that leverages a pretrained encoder and a novel decoder to improve multilingual neural machine translation, offering efficient training, strong cross‑language transfer, zero‑shot translation, and superior performance on various translation and summarization tasks.
Introduction
Multilingual neural machine translation (MNMT) has attracted increasing research interest because pretrained multilingual models can greatly reduce annotation and training costs while enhancing cross‑language transfer. DeltaLM is proposed as a new multilingual pretrained model built on an encoder‑decoder architecture that inherits the cross‑language abilities of a pretrained encoder.
Key Topics Covered
The presentation reviews the machine‑translation roadmap, describes the MNMT framework, introduces the DeltaLM pretrained model, explains how it integrates with NMT, and discusses zero‑shot cross‑language transfer.
Training Data and Sampling
Training corpora consist of fused multilingual sentence‑pair data, with varying scales across language directions. A sampling strategy balances the data to ensure fair representation of high‑resource and low‑resource language pairs.
DeltaLM Architecture
DeltaLM combines a pretrained encoder (e.g., XLM‑R) with a newly designed interleaved decoder that fully utilizes the encoder’s parameters. This design reduces training cost, preserves the encoder’s cross‑language knowledge, and decouples encoder and decoder for easier fine‑tuning.
Pretraining Tasks
Two pretraining objectives are used: (1) Span Corruption (T5 style) on monolingual text, and (2) Translation‑Pair Span Corruption on bilingual data, which masks spans across language pairs to learn alignment.
Two‑Stage Fine‑Tuning
Stage 1 fixes the encoder and embedding layers while fine‑tuning the decoder on bilingual data, preserving cross‑language transfer. Stage 2 unfreezes the encoder and continues fine‑tuning both encoder and decoder, optionally removing self‑attention residual connections to further improve language‑agnostic representations.
Experimental Results
DeltaLM achieves competitive or superior BLEU scores on multilingual translation benchmarks (e.g., 101 languages with FB‑m2m) despite using far fewer parameters than models like mT5‑XL. It also excels in cross‑language summarization (WikiLingua) and text‑generation tasks, demonstrating strong zero‑shot capabilities.
Language Transfer Findings
Experiments show that languages within the same family transfer more effectively, suggesting that a single high‑resource language can benefit the entire language family, reducing the need for extensive parallel data.
Conclusion
DeltaLM’s pretrained encoder‑decoder architecture and novel pretraining tasks provide powerful cross‑language transfer and generation abilities, enabling efficient multilingual NMT and zero‑shot translation with significantly lower data and parameter requirements.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.