Cross‑Lingual Structured Sentiment Analysis with Data Augmentation and Auxiliary Tasks

Meituan's Voice Interaction team tackled the lack of low‑resource language annotations and high optimization costs in SemEval‑2022 Task 10 by leveraging the cross‑lingual XLM‑RoBERTa model together with multi‑task learning and two data‑augmentation strategies, achieving first place in the zero‑shot subtask and second place in the monolingual subtask.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Cross‑Lingual Structured Sentiment Analysis with Data Augmentation and Auxiliary Tasks

Background

SemEval‑2022 Task 10 focuses on Structured Sentiment Analysis (SSA), which extracts opinion quadruples (Holder, Target, Expression, Polarity) from text. The competition provides seven hotel‑review datasets in five languages for a monolingual subtask and three low‑resource languages (Spanish, Catalan, Basque) for a zero‑shot cross‑lingual subtask. The authors’ system ranked second in the monolingual subtask and first in the cross‑lingual subtask.

Task and Evaluation

SSA is evaluated with Sentiment Graph F1 (SF1). SF1 requires exact span matching for Holder, Target, and Expression and a weighted true‑positive (WTP) that is non‑zero only when the polarity label is correct. The metric therefore penalises missing or mismatched spans and incorrect polarity.

Limitations of Prior Work

Typical pipelines treat Holder, Target, and Expression extraction as separate stages, causing error propagation and ignoring inter‑task dependencies. Barnes et al. (2021) [5] introduced a graph‑based dependency‑parsing model that captures dependencies among the four elements, but it suffers from two problems: (1) the pretrained language model (PLM) is not jointly trained with the graph parser, so PLM knowledge is under‑utilised; (2) the approach relies heavily on annotated data, which is scarce for the low‑resource languages (MultiBEU = 1 063 samples, MultiBCA = 1 174 samples).

Proposed End‑to‑End Model

The authors design a unified SSA model (Figure 2) with the following components:

Backbone encoder: XLM‑RoBERTa (large) provides multilingual cross‑lingual representations.

Sequence decoder: a BiLSTM layer adds sequential context.

Graph decoder: a bilinear attention matrix predicts dependency arcs that encode the quadruple structure.

To alleviate data scarcity, two data‑augmentation strategies are applied:

DA1 – In‑Domain Data Merging: hotel‑review datasets from multiple languages (MultiBEU, MultiBCA, OpeNerES, OpeNerEN) and a Portuguese hotel‑review corpus are merged, exploiting shared lexical items such as “hotel” (English, Spanish, Catalan) and “hotela” (Basque).

DA2 – Masked Language Model Generation: a small proportion of tokens in labeled sentences are masked and regenerated with XLM‑RoBERTa fine‑tuned on the task data. Masks are never applied to Expression spans to preserve polarity.

Two auxiliary tasks that require no extra annotation are added:

Sequence labeling: token‑level prediction of Holder/Target/Expression tags provides additional supervision.

Polarity classification: each sentence is assigned a polarity label (Positive, Negative, Neutral) based on its opinion quadruples; an MLP classifier consumes the average‑pooled BiLSTM hidden states.

The total loss is a weighted sum of the main SSA loss and the two auxiliary losses.

Model Selection Experiments

Four multilingual PLMs were compared: mBERT, XLM‑RoBERTa, infoXLM, and a word2vec + BiLSTM baseline. XLM‑RoBERTa was chosen because its Translation Language Modeling (TLM) objective and larger pre‑training corpus consistently outperform mBERT’s MLM‑only training. Table 1 shows that XLM‑RoBERTa + BiLSTM achieves the highest average SF1 on the official monolingual validation set, surpassing the strongest baseline (mBERT + BiLSTM) by 6.7 %. Adding the BiLSTM layer improves performance by 3.7 % (Cross & Huang 2016) [12].

We split the official training set into a training and a development split that matches the size of the official dev set.

Data Augmentation Results

DA1. Merging same‑domain hotel‑review corpora across languages yields a larger training set. Lexical overlap (e.g., “hotel” in English/Spanish/Catalan, “hotela” in Basque) and shared sentiment expressions (e.g., “excellent service”, “clean rooms”) make the merged data beneficial, especially for the smallest dataset (MultiBEU).

DA2. For each labeled sample, a few tokens are masked and regenerated by the task‑fine‑tuned XLM‑RoBERTa. Masks are avoided on Expression spans to prevent polarity drift.

Tables 2‑4 demonstrate that both DA1 and DA2 consistently raise SF1 on the official validation sets. DA2 provides a larger boost for the zero‑shot cross‑lingual subtask, while its impact on the monolingual subtask is modest.

Auxiliary Tasks Impact

The sequence‑labeling auxiliary task predicts token‑level types, and the polarity‑classification task treats each sentence as a sentiment classification problem using an MLP on the average‑pooled BiLSTM hidden states. Adding these tasks improves the loss landscape and yields higher SF1 on the development set (Table 5).

Comparison with Other Teams

Against other participants, the proposed system attains the highest average SF1 on Subtask‑2, exceeding the runner‑up by 5.2 percentage points. For Subtask‑1, it ranks first on four individual datasets (MultiBEU, MultiBCA, OpeNerES, OpeNerEN) and trails the overall leader by only 0.3 pp (Tables 6‑7).

Conclusion

The study demonstrates that a cross‑lingual pretrained encoder (XLM‑RoBERTa), combined with BiLSTM decoding, bilinear graph attention, in‑domain data merging, MLM‑based sample generation, and two auxiliary tasks, effectively addresses SSA under low‑resource conditions. The approach secured second place in the monolingual subtask and first place in the zero‑shot cross‑lingual subtask of SemEval‑2022 Task 10.

Figures

Overall framework
Overall framework
Data augmentation DA1 results
Data augmentation DA1 results
Data augmentation DA2 results
Data augmentation DA2 results
Sequence labeling task
Sequence labeling task

Code example

[11] Alexis Conneau and Guillaume Lample. 2019. Crosslingual language model pretraining. Advances in neural information processing systems, 32.
[12] James Cross and Liang Huang. 2016. Incremental parsing with minimal features using bi-directional lstm. ArXiv, abs/1606.06406.
[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[14] Timothy Dozat and Christopher D Manning. 2016. Deep biaffine attention for neural dependency parsing. arXiv preprint arXiv:1611.01734.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data augmentationmulti-task learningCross-Lingual TransferXLM-RoBERTaStructured Sentiment Analysis
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.