How Online Recommendation Engines Work: From Traffic Allocation to Real-Time Ranking
This article explains the online side of recommendation systems, covering data flow, traffic allocation, AB testing, the gateway, recall mechanisms, and real-time ranking, while also describing how offline-trained models are deployed as binary weight files for live serving.
Framework Overview
Previous article introduced recommendation systems overall, dividing into offline and online components. Offline handles data collection, storage, statistics, model training; online handles user requests, model usage, online learning. This article focuses on the online part, especially real-time ranking.
Data Flow
User requests data via app refresh or default request, sent through gateway to the engine.
User generates implicit or explicit feedback, collected by the app, sent through gateway to Flume, then Kafka for downstream modules.
These two streams will be discussed further.
Traffic Allocation
The traffic allocation module dynamically distributes traffic in real time, while ABTest creates multiple versions of pages or flows for statistical comparison. Example: iOS users may be shown more paid videos, Android users more original series; ABTest evaluates new algorithms or UI across both platforms.
Recommendation Engine
The engine consists of gateway, recall (match), and ranking. Offline algorithms such as matrix factorization (ItemCF, UserCF, ALS) and deep learning methods (wise&&deep) produce binary weight files after training.
These binary files are typically copied to online servers via scripts or stored in databases, though their large size may make database storage questionable.
Gateway
Handles user request validation, parameter parsing, and response assembly. It also manages “fake exposure” by storing message IDs in NoSQL and later reinserting them into the recall queue based on real exposure data.
Recall
After a request reaches the recall engine, various recallers are triggered based on user behavior and configuration (e.g., region). Cold-start users (fewer than 5‑10 clicks) use cold-start and hot recallers to ensure freshness and popularity. Recall manager aggregates message_id inverted lists from all recallers for a second-stage ranking.
Recall stage: generate candidate set from user profile, preferences, hot labels; cold-start service used for sparse profiles.
Filter stage: apply manual and policy rules to remove illegal or undesirable content.
Feature computation stage: compute feature vectors using real-time behavior, profile, and knowledge graph.
Sorting stage: coarse ranking of 200‑400 candidates to select top 100‑200 for final ranking, reducing latency.
After recall and sorting, the message_ids are passed to the detail service, which assembles the final recommendation results, followed by a tuner service for overall adjustment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
