Artificial Intelligence 18 min read

How Modern Recommendation Systems Work: Architecture, Algorithms, and Best Practices

This article explains the goals, architectures, data pipelines, recall strategies, and ranking models of contemporary recommendation systems, covering both online and offline components, collaborative filtering, content-based methods, feature engineering, and practical interview insights for engineers.

ITFLY8 Architecture Home

Jun 7, 2018

How Modern Recommendation Systems Work: Architecture, Algorithms, and Best Practices

Personalized recommendation has become a standard feature of internet products, yet many people still lack a clear understanding of recommendation technology itself.

Recommendation systems aim to satisfy user needs and are evaluated by accuracy, diversity, novelty, surprise, real‑time updates, transparency, and coverage.

User satisfaction: accuracy is the key metric.

Diversity: balance varied interests.

Novelty: show items the user has not seen before.

Surprise: present unexpected yet liked items.

Real‑time: update recommendations as user context changes.

Transparency: explain why an item is recommended.

Coverage: include long‑tail content.

Four recommendation approaches are described: popular recommendation, manual recommendation, related‑item recommendation, and personalized recommendation based on user history, with the first three accounting for about 80% of overall recommendations.

Recommendation System Architecture

Online Architecture

Core Modules

Business gateway: entry point that validates requests and assembles responses.

Recommendation engine: handles recall, filtering, feature computation, ranking, and diversification.

Data Flow

1. Requests flow from the gateway through traffic allocation to the business gateway, supporting HTTP, TCP, Thrift, or protobuf interfaces.

2. User behavior data passes from the gateway to a Flume agent, then to Kafka, providing real‑time streams for online profiling and offline storage.

Offline Architecture

This overview helps interviewees design a feed‑flow recommendation system.

From a framework perspective, recommendation systems consist of a data layer, a recall layer, and a ranking layer.

The data layer cleans raw logs into formatted data stored for downstream algorithms, including session logs, user profiles, and item documents.

The recall layer generates candidate sets from historical and real‑time behavior, applies coarse ranking, and filters candidates before passing them to the ranking layer.

The ranking layer uses machine‑learning models to perform fine‑grained ordering of the candidates.

Data Features

Data determines features, and features set the upper bound of performance; models strive to approach that bound.

User active behavior (clicks, shares, ratings) is used for both offline candidate generation and as weighted targets in re‑ranking models.

Negative feedback indicates unsatisfied results and is used to filter or down‑weight items and as high‑quality negative samples for training.

User profiles (demographics, interests, location, time) provide essential user‑side features for both candidate generation and re‑ranking.

Recall Layer (ReCall)

Collaborative Filtering

Collaborative filtering (CF) is a classic recommendation algorithm, including user‑CF and item‑CF, implemented via memory‑based methods, matrix factorization, or deep neural networks.

Memory‑based CF, such as item‑based CF, computes an item‑item co‑occurrence matrix from click data and derives similarity scores using Jaccard, cosine, or Euclidean distance.

When the interaction matrix is sparse, memory‑based methods suffer low coverage, prompting the use of matrix factorization (MF). MF decomposes the sparse matrix into low‑rank factors, learning latent vectors that predict missing entries.

MF is a benchmark for CF but struggles with extreme sparsity and cold‑start problems; deep learning approaches can incorporate semantic content features and user attributes to mitigate these issues.

Content‑Based Recall

Uses content embeddings (e.g., NLP‑derived item vectors) and matches them with user profile weights to rank candidates.

User‑Group Based Recall

First cluster items (e.g., using word2vec) and assign cluster IDs to users based on their interactions, or directly embed users and cluster them (e.g., k‑means).

Inverted Index

Traverse each user’s tags and quickly retrieve matching items via an inverted index, then select top‑N.

Sub‑Strategy Fusion

Combines multiple triggering algorithms to improve diversity and coverage. Common fusion methods include weighted, hierarchical, modulation, and filtering approaches.

Our current system integrates modulation and hierarchical fusion: algorithms receive candidate‑set proportions based on historical performance, with higher‑performing algorithms prioritized.

Model Ranking

Simple algorithmic ordering is insufficient; machine‑learning ranking models aggregate multiple signals for final ordering.

1. Model Selection and Comparison

Non‑linear models capture complex feature interactions but are costlier to train and update; linear models are lightweight and benefit from extensive feature engineering.

Non‑Linear Models

We primarily use gradient‑boosted decision trees (GBDT), which handle non‑linear relationships without extensive feature preprocessing.

Linear Models

Logistic Regression is common; we employ online learning with Google’s FTRL algorithm to update model weights in real time.

Key steps:

Write feature vectors to HBase.

Storm parses real‑time click and exposure logs, updating labels in HBase.

FTRL updates model weights.

Deploy updated parameters online.

2. Data

Sampling: address severe class imbalance in click‑through‑rate estimation.

Negative examples: distinguish true negatives from unviewed impressions; use skip‑above strategy and explicit user deletions.

Denoising: filter out fraudulent behavior such as click fraud.

3. Features

Features fall into three categories:

Item‑level features: category, PV, CTR, sub‑category, tags.

User‑level features: level, demographics, client type.

Cross features: user‑item interactions like clicks and favorites.

Non‑linear models can use these features directly; linear models require binning or normalization to produce values in [0,1] or binary form.

Recommendation System Architecture Interview Summary

Understand the full architecture diagram and the role of each module; be prepared to draw and explain it.

Recall modules perform candidate selection and coarse ranking because sorting all items within 100 ms is infeasible.

Source: http://www.cnblogs.com/redbear/p/8594939.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning recommendation system collaborative filtering Ranking Models online architecture

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.