Artificial Intelligence 13 min read

Decoupling Popularity Bias in Dual‑Tower Retrieval Models

The paper proposes CDAN, a dual‑tower retrieval model that separates item attribute and popularity representations via a Feature Decoupling Module with orthogonal embeddings, aligns head‑tail attribute distributions using MMD and contrastive learning, and jointly trains biased and unbiased towers, achieving higher tail recall, lower exposure concentration, and measurable online click‑through improvements.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Decoupling Popularity Bias in Dual‑Tower Retrieval Models

Background : In large‑scale recommendation, a small fraction of popular items dominates exposure (e.g., top 10% items receive 63% of impressions). This leads to two types of popularity bias: (1) popularity distribution difference, where item ID embeddings memorize popularity; (2) long‑tail distribution difference, where sparse logs cause poor representations for tail items.

Existing Solutions : Inverse propensity scoring (IPS) and causal graph inference mitigate bias but bluntly suppress popularity information, ignoring its value for user click‑through.

Proposed Method : We redesign the item tower of a dual‑tower model to decouple item attribute and popularity representations. A Feature Decoupling Module (FDM) uses two MLPs to extract orthogonal attribute and popularity embeddings. Orthogonality is enforced with L2 regularization and a contrastive loss that aligns the decoupled popularity vector with the true popularity signal.

Long‑Tail Alignment : To address sparse tail data, we introduce unexposed (mostly tail) items and apply Maximum Mean Discrepancy (MMD) to align the attribute distributions of head and tail items. Instance‑level alignment is added via contrastive learning on user click sequences, encouraging items clicked by the same user to have similar vectors.

Joint Training & Online Serving : The unbiased attribute tower and the biased popularity tower are trained jointly; during inference their scores are combined with a learned weight to balance genuine interest and herd behavior.

Experiments : Offline results on Hitrate@300 and C‑Ratio show the proposed CDAN model improves tail recall while reducing concentration. Online A/B tests demonstrate +0.28% pCTR, +0.15% per‑user clicks, and a 7–8% reduction in top‑K exposure concentration, confirming the benefit of leveraging popularity bias rather than suppressing it.

Analysis : Visualization (t‑SNE) confirms that attribute embeddings are unbiased while popularity embeddings separate head and tail items. Weight analysis shows increasing the bias weight gradually lowers concentration and raises tail hit rate.

Conclusion : Popularity bias reflects both herd behavior and genuine interest; a decoupled representation allows the system to exploit it without over‑amplifying, leading to better relevance and diversity.

contrastive learningRecommendation systemspopularity biasdomain adaptationdual-tower modelfeature decoupling
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.