Auxiliary Ranking Loss Enhances Classification Ability in Sparse‑Feedback CTR Prediction
This study investigates how adding an auxiliary ranking loss to click‑through‑rate (CTR) models not only improves ranking but also alleviates gradient‑vanishing for negative samples, thereby boosting the primary classification performance, especially under sparse positive‑feedback conditions.
Abstract Recent works have introduced auxiliary ranking loss functions into CTR estimation models and reported significant performance gains, yet most attribute improvements solely to enhanced ranking ability without examining effects on classification. This paper analyzes the challenges of binary cross‑entropy (BCE) loss in sparse feedback scenarios, discovers gradient‑vanishing for negative samples, and shows that auxiliary losses mitigate this issue, leading to better classification.
Research Problem CTR prediction is typically framed as a binary classification task optimized with BCE loss. Combining BCE with a ranking loss (e.g., Combined‑Pair) is common, but its impact on the core classification objective remains unclear. We compare a pure BCE model with a Combined‑Pair model on the public Criteo dataset, adjusting positive‑sample weights to simulate real‑world sparsity.
Findings on Classification Ability (1) Combined‑Pair achieves lower BCE loss on the validation set, indicating improved classification. (2) It also attains lower BCE loss on the training set, suggesting easier optimization. (3) Loss‑surface analysis reveals a flatter BCE surface compared to Combined‑Pair.
Gradient Analysis We derive gradients for negative and positive samples under BCE and RankNet components. Negative‑sample gradients are proportional to the predicted CTR (pCTR) and become vanishingly small when pCTR is low, a common case in sparse feedback. Positive‑sample gradients remain large. RankNet gradients for negative samples can be larger than BCE gradients under sparse positives, providing stronger learning signals.
Experimental Validation Experiments on Criteo confirm that Combined‑Pair consistently yields lower BCE loss and higher AUC across various positive‑sample sparsity levels, with larger gains as sparsity increases. Varying the weight λ between BCE and ranking loss shows a trade‑off: moderate λ reduction improves both metrics, while excessive ranking emphasis degrades performance.
Online Deployment The Combined‑Pair method was deployed in three Tencent advertising scenarios (WeChat Channels, WeChat Moments, DSP). A/B tests demonstrated notable GMV and consumption improvements (e.g., +0.7% GMV in Moments) and reduced BCE loss. New‑ad cold‑start performance also benefited, with higher GMV gains for fresher ads.
References The work is part of three papers accepted at SIGKDD 2024, including "Understanding the Ranking Loss for Recommendation with Sparse User Feedback".
Tencent Advertising Technology
Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.