Artificial Intelligence 16 min read

Fake News Detection with Multi‑level BERT Fusion at WSDM Cup 2019

In the WSDM Cup 2019 fake-news detection challenge, the Meituan Travel team secured second place by combining extensive data analysis, Chinese-English BERT fine-tuning, label-propagation augmentation, and a three-level fusion framework—blending, stacking, and linear regression—that lifted weighted accuracy to 0.88156.

Meituan Technology Team

Feb 21, 2019

Fake News Detection with Multi‑level BERT Fusion at WSDM Cup 2019

The 12th WSDM conference featured the WSDM Cup 2019 Fake News Detection task, where the Meituan Travel team achieved second place. Their solution combines extensive data analysis, preprocessing, data augmentation, and a multi‑level deep model fusion framework built on BERT.

1. Background

Rapid growth of online information has amplified the spread of false news, threatening social stability. The competition aims to accurately identify fake news by treating the problem as a Natural Language Inference (NLI) task.

2. Data Analysis

The provided dataset contains over 320,000 training samples and 80,000 test samples, each consisting of a pair of news headlines labeled as Agreed, Disagreed, or Unrelated. Initial analysis revealed severe class imbalance (Unrelated ≈ 70%, Disagreed < 3%) and a text‑length distribution mainly between 20–100 characters.

3. Preprocessing & Data Augmentation

To reduce noise, traditional Chinese characters were converted to simplified, and stop words were removed. A label‑propagation augmentation method was introduced: if headline A matches B and A matches C, then B matches C; similarly for mismatches. Additionally, headline pairs were swapped to double the training data.

4. Base Model

BERT, the state‑of‑the‑art bidirectional Transformer, was adopted as the base model because of its strong textual representation capability. Both Chinese and English pretrained BERT models were fine‑tuned on the augmented data.

5. Multi‑level Deep Model Fusion Framework

Three fusion levels were employed to balance performance and computational cost. Level 1 used Blending on 25 fine‑tuned BERT models. Level 2 applied 5‑fold Stacking with traditional classifiers (SVM, LR, KNN, NB) to diversify predictions. Level 3 combined the stacked outputs with a linear LR model. The overall training pipeline consisted of three stages: data split, generation of new training/testing sets, and final LR fusion with cross‑validation.

6. Experiments

6.1 Evaluation Metric

A weighted accuracy metric was used to mitigate class imbalance, assigning different weights to each class (Agreed = 1/15, Disagreed = 1/5, Unrelated = 1/16).

6.2 Results

The best single BERT model achieved 0.8675 accuracy. Simple averaging of 25 BERT models raised accuracy to 0.8770 (+0.95 pp). Weighted averaging gave 0.87702 (+0.952 pp). The proposed multi‑level fusion reached 0.88156 (+1.406 pp), demonstrating a significant improvement.

7. Conclusion & Future Work

The study shows that thorough data analysis, targeted augmentation, and a hierarchical fusion strategy can substantially boost fake news classification performance. Future directions include further pre‑training BERT on news‑domain corpora to narrow the domain gap between Wikipedia‑based pre‑training and news text.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Model Fusion NLP BERT fake news detection WSDM Cup

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.