Artificial Intelligence 13 min read

Decoupling Popularity Bias in Dual‑Tower Retrieval Models

The paper proposes CDAN, a dual‑tower retrieval model that separates item attribute and popularity representations via a Feature Decoupling Module with orthogonal embeddings, aligns head‑tail attribute distributions using MMD and contrastive learning, and jointly trains biased and unbiased towers, achieving higher tail recall, lower exposure concentration, and measurable online click‑through improvements.

DaTaobao Tech

May 31, 2022

Decoupling Popularity Bias in Dual‑Tower Retrieval Models

Background : In large‑scale recommendation, a small fraction of popular items dominates exposure (e.g., top 10% items receive 63% of impressions). This leads to two types of popularity bias: (1) popularity distribution difference, where item ID embeddings memorize popularity; (2) long‑tail distribution difference, where sparse logs cause poor representations for tail items.

Existing Solutions : Inverse propensity scoring (IPS) and causal graph inference mitigate bias but bluntly suppress popularity information, ignoring its value for user click‑through.

Proposed Method : We redesign the item tower of a dual‑tower model to decouple item attribute and popularity representations. A Feature Decoupling Module (FDM) uses two MLPs to extract orthogonal attribute and popularity embeddings. Orthogonality is enforced with L2 regularization and a contrastive loss that aligns the decoupled popularity vector with the true popularity signal.

Long‑Tail Alignment : To address sparse tail data, we introduce unexposed (mostly tail) items and apply Maximum Mean Discrepancy (MMD) to align the attribute distributions of head and tail items. Instance‑level alignment is added via contrastive learning on user click sequences, encouraging items clicked by the same user to have similar vectors.

Joint Training & Online Serving : The unbiased attribute tower and the biased popularity tower are trained jointly; during inference their scores are combined with a learned weight to balance genuine interest and herd behavior.

Experiments : Offline results on Hitrate@300 and C‑Ratio show the proposed CDAN model improves tail recall while reducing concentration. Online A/B tests demonstrate +0.28% pCTR, +0.15% per‑user clicks, and a 7–8% reduction in top‑K exposure concentration, confirming the benefit of leveraging popularity bias rather than suppressing it.

Analysis : Visualization (t‑SNE) confirms that attribute embeddings are unbiased while popularity embeddings separate head and tail items. Weight analysis shows increasing the bias weight gradually lowers concentration and raises tail hit rate.

Conclusion : Popularity bias reflects both herd behavior and genuine interest; a decoupled representation allows the system to exploit it without over‑amplifying, leading to better relevance and diversity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

contrastive learning Recommendation Systems popularity bias Domain Adaptation dual-tower model feature decoupling

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.