Exploitation and Exploration in Recommendation Systems: Bias Types, Mitigation Strategies, and Diversity Optimization
The article explains how recommendation systems balance exploitation and exploration, details various bias sources such as selection, exposure, conformity, and position bias, presents mitigation techniques like feature input, bias towers, and greedy algorithms, and discusses diversity‑focused exploration using DPP methods.
1 Introduction
Recommendation systems aim at two aspects: Exploitation and Exploration.
In Exploitation, relevance calculation is crucial, based on user behavior data such as browsing, watching, and favoriting. However, real‑world data can be sparse, especially in cold‑start scenarios, so the system must also perform Exploration to uncover latent interests and avoid overly similar recommendations.
2 Exploitation
Current ranking focuses on debiasing.
2.1 Bias
Many sources of model bias include:
Selection bias – users tend to rate items they like or dislike, e.g., Douban scores.
Exposure bias (sample selection bias) – users only interact with items they see; lack of interaction does not mean dislike. 1) The recommender decides which items are shown. 2) Users actively search for items of interest. 3) User background such as friends, location, etc. 4) Popular items are more likely to be seen.
Conformity bias – user behavior is influenced by others, so observed actions may not reflect true preferences.
Position bias – interaction likelihood varies with item position; users favor top‑ranked items, and some platforms place high‑profit, low‑interest items early to boost clicks.
2.2 Solutions
1) Feature input – treat position as a feature during training; at prediction time use a default value assuming all items appear in the same position.
2) Bias tower – add a shallow tower (e.g., YouTube Recsys'19) to predict bias using bias‑related features; its output is added to the logit before the sigmoid, with missing position feature at inference.
3) Greedy algorithm – predict each item at all positions and use a greedy algorithm to find the optimal combination (Deep Position‑wise Interaction Network, SIGIR 2021).
3 Exploration
This stage is usually part of the re‑ranking phase and primarily addresses diversity.
The Determinantal Point Process (DPP) algorithm (Fast Greedy MAP Inference for DPP, NIPS 2018) assumes that if two items are highly similar, a user clicking one reduces the desire to click the other.
By constructing a matrix whose determinant reflects user satisfaction, balancing relevance and diversity, we can evaluate each subset.
The matrix L is built such that L = R + α·D, where R is the relevance term, D is the diversity term, and α is a hyper‑parameter (0 < α < 1 improves diversity, α > 1 harms it; α = 1 corresponds to a standard Gaussian RBF).
The problem is NP‑hard, so greedy algorithms are employed. Because L is positive semi‑definite, it can be decomposed (e.g., Cholesky) into a lower‑triangular matrix.
Optimizations reduce the computational complexity from O(n³) to O(nk), but the matrix must remain positive semi‑definite; if any eigenvalue of L is negative, it is set to zero.
In profit‑driven scenarios, item profitability is also incorporated (see Zhuanzhuan Commercial OCPC Product Guide ).
4 Summary
There are many bias types, but not all need to be removed; for example, popularity bias in e‑commerce can be beneficial. Diversity may appear to lower relevance, yet experiments show it often improves business metrics, indicating strong user demand for diverse recommendations.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.