How Suffix Prediction Boosts English‑Russian Neural Machine Translation Accuracy
Researchers introduce a novel suffix‑prediction mechanism for neural machine translation that separately generates stems and suffixes during decoding, dramatically reducing out‑of‑vocabulary errors and morphological mistakes in English‑Russian translation, achieving consistent improvements across RNN and Transformer models on large‑scale news and e‑commerce datasets.
Abstract
Neural machine translation (NMT) models are limited by a fixed-size vocabulary, leading to many out‑of‑vocabulary (OOV) words, especially for morphologically rich languages such as Russian. Existing work mainly adjusts translation granularity or expands the vocabulary, but does not explicitly model morphology. This paper proposes a novel suffix‑prediction mechanism that predicts stems and suffixes separately during decoding, reducing data sparsity and morphological errors, and demonstrates stable improvements on both RNN‑based and Transformer‑based NMT systems over large‑scale datasets.
Research Background
Recent advances in NMT have shown superior performance over statistical machine translation. However, the fixed target‑side vocabulary (typically 30k‑50k words) cannot cover all forms of a morphologically rich language, causing OOV problems that severely affect translation quality.
Related Work
Previous approaches address OOV by adjusting translation granularity (subword or character‑level models) or by enlarging the target vocabulary with dynamic sub‑tables. While these methods reduce OOV rates, they do not explicitly model the morphological structure of the target language.
Neural Machine Translation
We evaluate our method on two mainstream NMT architectures: an RNN‑based encoder‑decoder (Bahdanau et al., 2015) and the Transformer (Vaswani et al., 2017).
Russian Stems and Suffixes
Russian words consist of a stem and a suffix; the suffix encodes number, case, gender, etc. By separating stems and suffixes, the number of unique stems is far smaller than the number of full word forms, and the suffix inventory contains only a few hundred types, alleviating data sparsity.
Suffix Prediction Network
During decoding, each step first generates a stem using the standard NMT decoder. Then, a feed‑forward network takes the generated stem, the decoder hidden state, and the source context to predict the suffix. The final word is obtained by concatenating the stem and suffix.
Experiments
We conducted experiments on the WMT‑2017 English‑Russian news translation task (≈5.3M sentence pairs) and on a large e‑commerce dataset (≈50M sentence pairs). Results show that our suffix‑prediction system outperforms subword and character baselines on both RNN and Transformer models.
Conclusion
We present a simple yet effective method that improves NMT for morphologically rich target languages by explicitly modeling suffixes. The approach yields consistent gains on both RNN‑based and Transformer‑based systems across news and e‑commerce domains, and represents the first work to model suffixes directly in NMT.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
