Unlocking Recommendation Systems: 10 Classic Machine Learning Algorithms Explained
This article surveys ten classic recommendation system algorithms—including collaborative filtering, association rules, Bayesian methods, K‑Nearest Neighbors, decision trees, random forests, matrix factorization, neural networks, word2vec, and logistic regression—detailing their principles, mathematical formulas, and practical implementation steps for real‑world applications.
Introduction This article introduces traditional machine‑learning algorithms used in recommendation systems, summarizing ten mainstream methods to help practitioners understand underlying implementations and apply them effectively in business scenarios.
1. Collaborative Filtering (CF)
CF leverages the principle of shared interests. The item‑based version computes similarity between items i and j using cosine similarity (see image). The recommendation score for user u on item j is derived from the weighted sum of user u’s ratings on similar items.
Business practice For a shopping basket, compute item‑item similarity, then calculate each user’s CF score for all items and recommend the top‑10.
Step1: Compute item similarity.
Step2: Compute user‑item CF scores.
Step3: Recommend top‑10 items.
2. Association‑Rule Based Recommendation
Rules of the form X→Y are derived from historical data. Support measures the joint occurrence of X and Y; confidence measures the conditional probability of Y given X. Frequent itemset mining (Apriori or FP‑tree) is used, but a simplified rule generation is presented.
Business practice Using shopping basket data, prepare the dataset, perform feature crossing, compute support and confidence, filter rules by thresholds, and recommend items with highest scores.
Step1: Data preparation.
Step2: Feature crossing.
Step3: Generate rules.
Step4: Recommend items.
3. Bayesian Recommendation
Bayes theorem relates P(Bi|u) = P(u|Bi)·P(Bi)/P(u). The model estimates the probability of a user installing an app given historical installation data.
Business practice For an app store, compute P(B) and P(Ai|B) from display and install logs, apply Bayes formula to rank candidate apps, and recommend the top‑10.
4. K‑Nearest Neighbors (KNN)
KNN classifies an item by the majority class among its k nearest neighbors using a distance metric (e.g., Euclidean, Manhattan, cosine). In recommendation, similarity between users or items is computed, then nearest neighbors are used to generate suggestions.
Business practice In an app store, after a user downloads an app, recommend the four most similar apps based on one‑hot encoded feature vectors.
Step1: Define item vectors.
Step2: Compute distances.
Step3: Recommend nearest items.
5. Decision Tree
Decision trees (ID3, C4.5, CARD) split data based on impurity measures such as information gain, information gain ratio, or Gini index. A leaf node represents a decision rule.
Complexity calculation Information entropy H(x) = -∑ pi log2 pi; Gini index G = 1 - ∑ pi². Splitting criteria differ among algorithms.
Step1: Choose split node.
Step2: Check termination.
Step3: Prune.
Step4: Output tree.
6. Random Forest (RF)
RF builds multiple decision trees on bootstrapped samples and random feature subsets. For prediction, average (regression) or majority vote (classification) of tree outputs is taken.
7. Matrix Factorization
Factorizes the user‑item rating matrix U into P (user‑topic) and Q (topic‑item) matrices: U ≈ P·Q. The loss function minimizes (Uij - Pi·Qj)² plus regularization, solved by gradient descent.
Prediction for unknown items: ŷij = Pi·Qj.
8. Back‑Propagation (BP) Neural Network
BP networks consist of multiple layers; forward propagation computes outputs, and back‑propagation updates weights using gradient descent. Typical activation functions: ReLU, sigmoid, tanh.
9. Word2Vec (W2V)
W2V learns low‑dimensional word embeddings via two architectures: CBOW (predict target word from context) and Skip‑gram (predict context from target). Training uses a hierarchical softmax over a Huffman tree.
10. Logistic Regression (LR)
LR models the probability of a binary outcome using the sigmoid function σ(z) = 1/(1+e⁻ᶻ). The loss is cross‑entropy, optimized via gradient descent.
In recommendation, features (user, item, scene) are one‑hot encoded, possibly crossed, and fed into the LR model to predict click‑through or conversion probability.
Overall, the article provides algorithmic foundations, mathematical expressions (as images), and step‑by‑step business implementations for each of the ten recommendation techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
