MOBIUS: A Next‑Generation Multi‑Objective Recall System for Baidu Sponsored Search
This article introduces Baidu's new multi‑objective recall system (MOBIUS), which integrates relevance and business metrics such as CPM into the recall stage by migrating CTR models to recall, using data augmentation and a teacher‑student framework to improve ad monetization while preserving relevance.
The article presents Baidu's next‑generation multi‑objective recall system, named "MOBIUS," which enhances the classic two‑stage recall‑ranking pipeline by incorporating business objectives like CPM directly into the recall layer, thereby improving overall system efficiency.
Innovations
Introduces CPM and other business metrics as recall criteria while maintaining relevance.
Integrates the traditional CTR prediction model into the recall stage, forming a novel commercial recall architecture.
System Architecture
The MOBIUS architecture consists of two core modules: a data‑augmentation module and a model‑training module. The data‑augmentation module generates training samples that highlight low‑relevance, high‑CTR (badcase) instances using an active‑learning teacher‑student framework.
Data Augmentation Process
Load a batch of click logs.
Construct query and ad sets from the batch.
Form all possible query‑ad pairs (N × M samples).
Score each pair with a relevance model and filter low‑relevance pairs.
Predict PCTR for the filtered pairs using a CTR model (T‑2).
Sample pairs based on PCTR, labeling them as badcase.
Combine the augmented badcase samples with original CTR training data, extending the binary classification to three classes (click, unclick, badcase).
Model Training Module
The model follows a classic dual‑tower design: user queries and ads are encoded into 96‑dimensional embeddings, split into three 32‑dimensional vectors, whose inner products produce three scores that are fed into a softmax layer to predict click, unclick, or badcase.
Online Retrieval
During online serving, query embeddings are used to retrieve high‑quality candidate ads via Approximate Nearest Neighbor (ANN) search (e.g., FAISS, HNSW). The system also supports Maximum Inner Product Search (MIPS) to embed business metrics directly into similarity calculations, and employs vector compression to reduce memory and storage costs.
Experimental Results
Offline experiments (see original paper) demonstrate significant improvements, and online deployment on Baidu PC and mobile platforms shows a substantial CPM increase within one week of monitoring.
Conclusion
By jointly optimizing relevance and monetization at the recall stage, MOBIUS achieves higher commercial value without sacrificing user experience, offering a promising direction for future ad retrieval systems.
Original source: https://zhuanlan.zhihu.com/p/146210155
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.