Artificial Intelligence 14 min read

Fine-grained User Review Sentiment Classification: AI Challenger 2018 Champion's Approach

Cheng Huige’s winning AI Challenger 2018 solution treated fine‑grained Chinese review sentiment as a 20‑aspect multi‑class task, combining a high‑capacity LSTM encoder with self‑attention, word‑and‑character embeddings, simplified ELMo pre‑training, diverse tokenizations and a weighted seven‑model ensemble (including BERT), which together delivered the competition’s top F1 performance.

Meituan Technology Team

Jan 25, 2019

Fine-grained User Review Sentiment Classification: AI Challenger 2018 Champion's Approach

Article No. 330

2019 Issue 008

In the 2018 AI Challenger global competition jointly organized by Meituan-Dianping, Innovation Works, Sogou and Meitu, the "Fine-grained User Review Sentiment Classification" track was won by Cheng Huige, a solo participant from Peking University now working at Meituan-Dianping. This article summarizes his ideas and experience.

Background

The competition featured two challenging tracks: fine-grained sentiment analysis of user reviews and autonomous driving perception. The sentiment track attracted the most participants, accounting for about one‑fifth of all teams.

The dataset contains 6 major categories and 20 fine‑grained sub‑categories of Chinese comments, a rare and valuable resource for both research and industry.

1. Tools

The team used a custom training framework that unified TensorFlow and PyTorch models, built on the open‑source RNet and MnemonicReader from HKUST, with later additions of a BERT‑based model for ensemble gains.

2. Overall Approach

The problem was treated as a 20‑aspect multi‑class classification task. An LSTM‑based end‑to‑end model was employed to encode the text.

Data quality was identified as the key to performance. Large‑scale Meituan review corpora were collected, and the team experimented with pre‑training techniques such as ELMo and later BERT.

3. Baseline Model

Following Kaggle Toxic competition experience, the baseline used LSTM Encode + Pooling, which is known to outperform CNNs on long‑text classification.

4. Model‑level Optimizations

Self‑Attention was added to capture intra‑text relationships. The attention output was fused with the original LSTM output using either a Gate (RNet) or Semantic Fusion (MnemonicReader) mechanism.

5. Model Details

LSTM outperforms GRU.

Hidden size 400 > 200 > 100 yields better results.

Top‑k Pooling + Attention Pooling beats standalone Max or Attention pooling.

Separate pooling and final FC layers per aspect improve performance.

Because each aspect has four classes, a larger parameter count is beneficial.

Triangular learning‑rate scheduling (inspired by BERT) gave the biggest boost.

Word + Char modeling combines token‑level and character‑level information, reducing UNK impact.

Large vocabularies (14.4 W with Jieba, 19.8 W with SentencePiece) pretrained on fastText further improve embeddings.

Only high‑frequency words are fine‑tuned; low‑frequency word vectors remain fixed.

6. Pretrained Language Models

A simplified ELMo loss was implemented to avoid the heavy cost of the full model. The first LSTM layer was pretrained with ELMo loss and then fine‑tuned on the target task, accelerating convergence to about one hour.

7. Model Ensemble

Multiple tokenization granularities (Jieba and SentencePiece) were combined with Word + Char modeling to increase diversity. RNet, MnemonicReader and BERT structures added further variance. The best‑F1 checkpoint per aspect was selected, and aspect‑wise weighted averaging (weights derived from validation F1) produced a strong 7‑model ensemble.

8. About BERT

Char‑based BERT did not surpass the simplified ELMo model due to the 512‑token limit and over‑fitting on short sequences. Further BERT optimization may yield better results.

9. Future Optimizations

Explore F1‑centric optimization, possibly via batch‑level reinforcement learning.

Investigate pre‑training LSTM‑based models with BERT‑style loss and assess Transformer‑based BERT performance on this dataset.

Interview with Champion Cheng Huige

Q: How did you feel about the competition?

He noted the rapid evolution of AI techniques (ELMo, BERT) and emphasized continuous learning for engineers.

Q: Advice for newcomers?

Study deep‑learning courses from top universities, gain practical experience through projects, internships, or competitions like AI Challenger and Kaggle.

Q: Why choose the fine‑grained sentiment track?

His prior experience with text classification made the task appealing.

Q: Most rewarding achievement?

Iterative improvements, especially the impact of the simplified ELMo model.

Q: What did you learn?

He transitioned from TensorFlow to PyTorch, appreciated pretrained language models, and built valuable connections, eventually joining Meituan‑Dianping.

---------- END ----------

Also you may be interested in:

AI technology in intelligent poster design

Deep learning in Meituan search ad ranking

Deep learning applications in text domain

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning Sentiment Analysis NLP pretraining BERT model ensemble ELMo

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.