Artificial Intelligence 7 min read

How Online Recommendation Engines Work: From Traffic Allocation to Real-Time Ranking

This article explains the online side of recommendation systems, covering data flow, traffic allocation, AB testing, the gateway, recall mechanisms, and real-time ranking, while also describing how offline-trained models are deployed as binary weight files for live serving.

ITFLY8 Architecture Home

Jun 8, 2018

How Online Recommendation Engines Work: From Traffic Allocation to Real-Time Ranking

Framework Overview

Previous article introduced recommendation systems overall, dividing into offline and online components. Offline handles data collection, storage, statistics, model training; online handles user requests, model usage, online learning. This article focuses on the online part, especially real-time ranking.

Data Flow

User requests data via app refresh or default request, sent through gateway to the engine.

User generates implicit or explicit feedback, collected by the app, sent through gateway to Flume, then Kafka for downstream modules.

These two streams will be discussed further.

Traffic Allocation

The traffic allocation module dynamically distributes traffic in real time, while ABTest creates multiple versions of pages or flows for statistical comparison. Example: iOS users may be shown more paid videos, Android users more original series; ABTest evaluates new algorithms or UI across both platforms.

Recommendation Engine

The engine consists of gateway, recall (match), and ranking. Offline algorithms such as matrix factorization (ItemCF, UserCF, ALS) and deep learning methods (wise&&deep) produce binary weight files after training.

These binary files are typically copied to online servers via scripts or stored in databases, though their large size may make database storage questionable.

Gateway

Handles user request validation, parameter parsing, and response assembly. It also manages “fake exposure” by storing message IDs in NoSQL and later reinserting them into the recall queue based on real exposure data.

Recall

After a request reaches the recall engine, various recallers are triggered based on user behavior and configuration (e.g., region). Cold-start users (fewer than 5‑10 clicks) use cold-start and hot recallers to ensure freshness and popularity. Recall manager aggregates message_id inverted lists from all recallers for a second-stage ranking.

Recall stage: generate candidate set from user profile, preferences, hot labels; cold-start service used for sparse profiles.

Filter stage: apply manual and policy rules to remove illegal or undesirable content.

Feature computation stage: compute feature vectors using real-time behavior, profile, and knowledge graph.

Sorting stage: coarse ranking of 200‑400 candidates to select top 100‑200 for final ranking, reducing latency.

After recall and sorting, the message_ids are passed to the detail service, which assembles the final recommendation results, followed by a tuner service for overall adjustment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Recommendation System traffic allocation recall engine online ranking

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.