Multi-Path Recall and Ranking Techniques in Real-Time Bidding Advertising Systems
In real‑time bidding advertising, a multi‑path recall framework quickly filters billions of ads using parallel non‑personalized and personalized strategies—such as hot‑item rules, collaborative‑filtering, skip‑gram vectors, and GraphSAGE embeddings—while respecting targeting constraints, before a ranking stage optimizes eCPM, with effectiveness measured offline and online and future extensions planned with large language models.
In real‑time bidding (RTB) advertising, the ad system must quickly select suitable ads for an incoming request. The process is divided into two main stages, similar to recommendation systems: recall and ranking . Recall quickly filters a massive candidate pool to obtain a top‑N set of ads, while ranking optimizes eCPM and overall revenue.
Recall aims to discard irrelevant items and keep only those likely to interest the user. Because of performance constraints, recall uses multiple simple strategies in parallel (multi‑path recall) rather than a single complex model. Strategies include non‑personalized recall (hot items, new material, high‑click/convert ads) and personalized recall (similarity‑based, collaborative filtering, deep models with vector embeddings).
Figure 1: Advertising scoring pipeline.
The system implements a multi‑path recall framework that adapts to different business lines and ad slots. In playback‑page scenarios, content‑related recall uses album categories and text matching to increase ad‑content similarity, achieving an 18.4% CTR lift and >5% eCPM increase. In general scenarios (e.g., feed), personalized recall combines item‑based collaborative filtering, vector recall, and graph‑model recall, providing diverse and complementary candidate sets.
Figure 2: Evolution of recall optimization.
Recall also incorporates advertiser‑level constraints (targeting conditions, time windows) by filtering the candidate set against real‑time ad status indexes. User data (tags, context) and ad data (availability, targeting rules) are merged to form the initial candidate pool.
Figure 4: Multi‑path recall architecture.
For rule‑based personalized recall, classic algorithms such as Item‑based Collaborative Filtering (ItemCF) are adapted to the advertising domain by using shared advertiser click statistics instead of user‑item interactions. Offline pipelines build user→ad preference indexes and ad→similar‑ad indexes; online recall queries these indexes to retrieve top‑K similar ads.
Figure 8: Collaborative filtering similarity calculation.
Model‑driven recall includes:
Skip‑gram vector recall: ads are treated as words in a sequence; a skip‑gram Word2Vec model learns ad embeddings, enabling similarity search via vector distance.
GraphSAGE graph‑model recall: a graph neural network aggregates multi‑type node information (users, ads, clicks) to produce robust ad embeddings, addressing data sparsity and improving long‑tail recommendation.
Figure 9: Skip‑gram training illustration.
Figure 10: User‑click graph construction.
Specific scenario recall addresses domain‑specific needs, such as e‑commerce ads (SKU‑based vectorization) and child‑focused playback pages (content safety filtering via text classification).
Effectiveness is evaluated both offline (recall precision, coverage, diversity, novelty) and online (AB‑test metrics such as exposure share, CTR, consumption time). Quota allocation among recall paths is dynamically adjusted based on performance.
Future directions include leveraging large language models (LLMs) for richer multimodal representations and aligning recall objectives with downstream ranking goals while avoiding feedback loops.
Ximalaya Technology Team
Official account of Ximalaya's technology team, sharing distilled technical experience and insights to grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.