Introduction to User Behavior and Collaborative Filtering in Recommendation Systems
This article explains user behavior concepts and feedback types, introduces collaborative filtering methods including user‑based, item‑based and latent factor models, discusses similarity measures, power‑law distributions, and practical considerations such as negative sampling, providing a comprehensive overview for building recommendation systems.
User Behavior Introduction
Collaborative filtering, known in academia as the collaborative filtering algorithm, relies on user behavior for recommendations. Explicit feedback (e.g., ratings, likes/dislikes) and implicit feedback (e.g., page views) are the two main categories. Feedback can also be classified as positive or negative.
Representing diverse online behaviors—such as browsing, purchasing, commenting, and rating—in a unified way is challenging; a possible representation is illustrated in the accompanying diagram.
User Behavior Analysis
Two variables are defined:
User activity: total number of items a user has interacted with.
Item popularity: total number of users who have interacted with an item.
Both user activity and item popularity follow a Power Law (long‑tail) distribution. More active users tend to browse niche items.
Recommendation algorithms based solely on user behavior are generally called collaborative filtering. Research has produced various approaches, such as neighborhood‑based methods, latent factor models, and graph‑based random walk algorithms.
Neighborhood‑Based Algorithms
Neighborhood methods are divided into two major categories:
User‑based collaborative filtering: recommends items liked by users with similar interests.
Item‑based collaborative filtering: recommends items similar to those the target user already likes.
User‑Based Collaborative Filtering
The process involves two steps:
Identify a set of users whose interests are similar to the target user.
Recommend items liked by this set that the target user has not yet encountered.
Similarity can be measured using Euclidean distance, Pearson correlation, Cosine similarity, or Tanimoto coefficient, each affecting results differently.
Item‑Based Collaborative Filtering
Item‑based collaborative filtering evaluates similarity between items based on user ratings, then recommends items similar to those the user previously liked.
Comparison of UserCF and ItemCF
In e‑commerce, the number of users usually far exceeds the number of items, making ItemCF computationally cheaper. For non‑social sites, content similarity is a stronger recommendation signal than user similarity. In social networks, UserCF (especially when combined with social information) can provide more explainable recommendations.
Latent Factor Model (LFM)
The latent semantic model originated in text mining (LSI, pLSA, LDA, Topic Model) and was later adapted for recommendation via matrix factorization. Traditional SVD is computationally intensive for large datasets; Funk‑SVD (also called Latent Factor Model) improves scalability.
Matrix factorization represents the user‑item interaction matrix R as the product of two lower‑dimensional matrices: P (user‑topic) and Q (topic‑item). Each entry R ij denotes user i 's interest in item j . Missing values can be initialized with the average rating.
When the matrix is large, SVD becomes slow, so gradient descent is used to learn P and Q . The update rules (shown in the diagrams) involve three hyper‑parameters:
Number of latent factors F .
Learning rate alpha .
Regularization parameter lambda .
Since only positive interactions are observed (implicit feedback), negative samples ( R ij =0 ) must be generated. The sampling strategy ensures a balance of positive and negative samples per user and selects popular items that the user has not interacted with, because obscure items may simply be undiscovered.
Summary
This article introduced the fundamentals of user behavior, explicit/implicit feedback, and positive/negative feedback. It then described two major families of recommendation algorithms: neighborhood‑based methods and latent semantic models. The next article will demonstrate how to apply these algorithms using the Surprise library.
References
Using LFM (Latent Factor Model) for Top‑N recommendation: http://blog.csdn.net/harryhuang1990/article/details/9924377
Recommendation System Practice
Source: https://www.zybuluo.com/zhuanxu/note/985025
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.