DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation

DeltaLM is a multilingual pretrained encoder‑decoder model that leverages cross‑lingual transfer from a pretrained encoder and novel decoder architecture, employs span‑corruption and translation‑pair pretraining tasks, and uses a two‑stage fine‑tuning strategy to achieve strong zero‑shot and supervised translation performance across over 100 languages.

DataFunSummit
DataFunSummit
DataFunSummit
DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation

DeltaLM is a new multilingual pretrained encoder‑decoder model designed to improve neural machine translation (NMT) by leveraging the cross‑lingual transfer ability of pretrained encoders.

The model combines a pretrained encoder (e.g., XLM‑R) with a novel interleaved decoder, enabling full reuse of encoder parameters and efficient training.

Two pretraining tasks are used: Span Corruption (T5‑style) on monolingual data and Translation Pair Span Corruption on bilingual data, allowing the model to learn both language modeling and cross‑language alignment.

A two‑stage fine‑tuning strategy is proposed: first freeze the encoder and fine‑tune the decoder on bilingual data, then jointly fine‑tune encoder and decoder while removing self‑attention residual connections to enhance language‑agnostic representations.

Extensive experiments on 100+ languages demonstrate that DeltaLM achieves competitive or superior performance to larger models (e.g., mT5, MT‑5) on multilingual MT, cross‑lingual summarization, and zero‑shot translation, while using significantly fewer parameters.

Conclusions: multilingual pretrained models can greatly reduce annotation and training costs for NMT and improve zero‑shot cross‑language transfer, with DeltaLM’s architecture and training objectives providing strong cross‑lingual generation capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

zero-shotpretrained modelsNeural Machine Translationmultilingual translationCross-Lingual TransferDeltaLM
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.