DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation
DeltaLM is a multilingual pretrained encoder‑decoder model that leverages cross‑lingual transfer from a pretrained encoder and novel decoder architecture, employs span‑corruption and translation‑pair pretraining tasks, and uses a two‑stage fine‑tuning strategy to achieve strong zero‑shot and supervised translation performance across over 100 languages.
DeltaLM is a new multilingual pretrained encoder‑decoder model designed to improve neural machine translation (NMT) by leveraging the cross‑lingual transfer ability of pretrained encoders.
The model combines a pretrained encoder (e.g., XLM‑R) with a novel interleaved decoder, enabling full reuse of encoder parameters and efficient training.
Two pretraining tasks are used: Span Corruption (T5‑style) on monolingual data and Translation Pair Span Corruption on bilingual data, allowing the model to learn both language modeling and cross‑language alignment.
A two‑stage fine‑tuning strategy is proposed: first freeze the encoder and fine‑tune the decoder on bilingual data, then jointly fine‑tune encoder and decoder while removing self‑attention residual connections to enhance language‑agnostic representations.
Extensive experiments on 100+ languages demonstrate that DeltaLM achieves competitive or superior performance to larger models (e.g., mT5, MT‑5) on multilingual MT, cross‑lingual summarization, and zero‑shot translation, while using significantly fewer parameters.
Conclusions: multilingual pretrained models can greatly reduce annotation and training costs for NMT and improve zero‑shot cross‑language transfer, with DeltaLM’s architecture and training objectives providing strong cross‑lingual generation capabilities.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.