Mitigating Exposure Bias in Tubi’s Recommendation System
This article explains how Tubi’s machine‑learning team reduces exposure bias in its video recommendation pipeline by normalizing popularity features, incorporating additional signals such as search behavior, and applying exploration techniques like bandit algorithms to diversify content exposure.
Exposure Bias in Recommendation Systems
Recommendation systems help select content from a massive library of movies and TV shows for users and learn from user feedback. Because the current recommendation list influences future recommendations, a feedback loop can create severe exposure bias, causing a small set of items to dominate user feeds—an "information island" where many videos never get shown.
When generating personalized recommendations from user actions (clicks, views), it is essential to consider exposure bias to avoid feedback loops. For example, a new user who watches a horror movie during Halloween may be repeatedly recommended horror titles, even though they might also enjoy other genres.
Through this blog we present several methods Tubi uses to address exposure bias.
Feature Engineering
A simple way to reduce exposure bias is to avoid using raw popularity as a feature. Instead, we normalize popularity by exposure, e.g., using the average popularity across multiple exposures. For cold‑start items with few exposures this metric can be unstable.
Algorithm:
Sort items by popularity and bucket them into X groups (X is a hyper‑parameter), ensuring each bucket contains roughly 1/X of the total popularity.
Within each bucket, sort items by their average popularity, assuming equal confidence for all items in the bucket.
Online experiments showed that after adding this feature, long‑tail videos received significantly more impressions, improving their ranking.
Leveraging Additional Signals
Beyond homepage recommendations, we can use other sources such as likes, search queries, and watch behavior from search results. For example, if a user’s search watch history differs from their homepage history, we can incorporate the search signals to enrich the homepage feed.
In a real example, User A primarily watched horror on the homepage but also searched and watched many documentaries. By adding search‑derived features, we were able to recommend documentaries to the user, improving both play and retention metrics.
Exploration
Exploration—showing a subset of items from a category—helps collect feedback for sparsely interacted content and reduces uncertainty. Adding random jitter to ranking scores (Boltzmann exploration) hurt user experience, so we instead built an independent exploration module that continuously gathers unbiased feedback.
We introduced a "Something Completely Different" category on the homepage and ran various exploration algorithms there. Feedback from this category proved valuable for improving recommendations in other categories.
Bandits Algorithms
The trade‑off between exploration and exploitation is central to bandit and reinforcement‑learning approaches. Bandit strategies help cold‑start new users by recommending fresh content while maintaining a good user experience. Our bandit model for new users broke the feedback loop of repeatedly recommending only popular items, leading to more diverse recommendations and higher satisfaction.
Conclusion
This article described Tubi’s practical approaches to mitigating exposure bias in recommendation systems, which can be quickly adapted to other platforms and problems.
If you are interested in learning more about bias mitigation, follow Tubi’s technical blog or join the Tubi Machine Learning team.
Bitu Technology
Bitu Technology is the registered company of Tubi's China team. We are engineers passionate about leveraging advanced technology to improve lives, and we hope to use this channel to connect and advance together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.