Artificial Intelligence 10 min read

MOBIUS: A Next‑Generation Multi‑Objective Recall System for Baidu Sponsored Search

This article introduces Baidu's new multi‑objective recall system (MOBIUS), which integrates relevance and business metrics such as CPM into the recall stage by migrating CTR models to recall, using data augmentation and a teacher‑student framework to improve ad monetization while preserving relevance.

DataFunTalk
DataFunTalk
DataFunTalk
MOBIUS: A Next‑Generation Multi‑Objective Recall System for Baidu Sponsored Search

The article presents Baidu's next‑generation multi‑objective recall system, named "MOBIUS," which enhances the classic two‑stage recall‑ranking pipeline by incorporating business objectives like CPM directly into the recall layer, thereby improving overall system efficiency.

Innovations

Introduces CPM and other business metrics as recall criteria while maintaining relevance.

Integrates the traditional CTR prediction model into the recall stage, forming a novel commercial recall architecture.

System Architecture

The MOBIUS architecture consists of two core modules: a data‑augmentation module and a model‑training module. The data‑augmentation module generates training samples that highlight low‑relevance, high‑CTR (badcase) instances using an active‑learning teacher‑student framework.

Data Augmentation Process

Load a batch of click logs.

Construct query and ad sets from the batch.

Form all possible query‑ad pairs (N × M samples).

Score each pair with a relevance model and filter low‑relevance pairs.

Predict PCTR for the filtered pairs using a CTR model (T‑2).

Sample pairs based on PCTR, labeling them as badcase.

Combine the augmented badcase samples with original CTR training data, extending the binary classification to three classes (click, unclick, badcase).

Model Training Module

The model follows a classic dual‑tower design: user queries and ads are encoded into 96‑dimensional embeddings, split into three 32‑dimensional vectors, whose inner products produce three scores that are fed into a softmax layer to predict click, unclick, or badcase.

Online Retrieval

During online serving, query embeddings are used to retrieve high‑quality candidate ads via Approximate Nearest Neighbor (ANN) search (e.g., FAISS, HNSW). The system also supports Maximum Inner Product Search (MIPS) to embed business metrics directly into similarity calculations, and employs vector compression to reduce memory and storage costs.

Experimental Results

Offline experiments (see original paper) demonstrate significant improvements, and online deployment on Baidu PC and mobile platforms shows a substantial CPM increase within one week of monitoring.

Conclusion

By jointly optimizing relevance and monetization at the recall stage, MOBIUS achieves higher commercial value without sacrificing user experience, offering a promising direction for future ad retrieval systems.

Original source: https://zhuanlan.zhihu.com/p/146210155

advertisingMachine LearningctrrecallBaidumulti-objective
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.