Debiasing Competition Solution: Multi‑hop i2i Graph Modeling for Advertising Recommendation

The winning KDD Cup 2020 debiasing solution builds a heterogeneous item‑to‑item graph with click‑co‑occurrence and multimodal similarity edges, uses multi‑hop random walks to generate unbiased candidate samples, trains LightGBM with a popularity‑weighted loss, and aggregates scores to lift low‑popularity items, thereby eliminating selection and popularity bias and achieving first place among 1,895 teams.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Debiasing Competition Solution: Multi‑hop i2i Graph Modeling for Advertising Recommendation

The ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) hosts an annual competition that attracts both academia and industry. In KDD Cup 2020, the Debiasing track focused on eliminating selection and popularity bias in next‑item prediction for e‑commerce advertising.

Data analysis revealed two major biases: (1) Selection Bias – the exposure data used for training is a subset selected by the system, and (2) Popularity Bias – popular items receive disproportionate clicks, leading to a “Matthew effect”. The dataset contains over 1 M clicks, 100 k items, and 30 k users, with rich multimodal item features (text and image vectors).

To address these challenges, the team built a heterogeneous item‑to‑item (i2i) graph with two edge types: click‑co‑occurrence edges (weighted by time interval, user activity, and popularity penalties) and multimodal similarity edges (cosine similarity of text/image vectors). Multi‑hop random walks on this graph generated unbiased candidate samples, expanding the training set and mitigating selection bias.

The modeling pipeline consists of three stages:

Construction of the i2i graph and multi‑hop walk to produce candidate samples.

Creation of i2i training pairs, automatic high‑order feature engineering, and training with LightGBM using a popularity‑weighted loss:

L = -\sum_{i}\left[\alpha\cdot y_i\log(p_i) + (1-y_i)\log(1-p_i)\right]

, where \(\alpha\) is inversely proportional to item popularity.

Aggregation of scores from multiple source items via max‑pooling and post‑processing that boosts low‑popularity items, improving NDCG@50_half.

The solution achieved 1st place out of 1 895 teams, with NDCG@50_half 6.0% higher than the runner‑up. The same techniques were applied to Meituan’s search advertising system, where bias mitigation led to significant business improvements across matching, ranking, and creative‑selection stages.

In summary, converting traditional user‑to‑item (u2i) CTR modeling to user‑to‑item‑to‑item (u2i2i) i2i modeling, enriching the candidate set via multi‑hop graph walks, and incorporating popularity penalties in both graph construction and loss functions effectively resolve both selection and popularity bias in large‑scale recommendation systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Advertisingmachine learningRecommendation Systemsbias mitigationKDD CupGraph Modeling
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.