Mastering Collaborative Filtering: From Traditional Similarity to Deep Neural Models
This article provides a comprehensive technical overview of collaborative filtering, covering traditional user‑ and item‑based similarity methods, matrix‑factorization approaches for implicit feedback, various loss functions, and a suite of deep neural network models such as GMF, MLP, NeuMF, DMF, and ConvMF, together with implementation details, evaluation metrics, and practical deployment considerations.
1. Overview of Collaborative Filtering
Collaborative filtering (CF) has become a cornerstone of modern recommendation systems, with early successes in Google News, Amazon, Hulu, and Netflix. CF typically represents user–item interactions with a rating matrix R, where each entry r_{ij} denotes user i 's preference for item j. Because most users interact with only a tiny fraction of items, the matrix is extremely sparse (often >90% empty), which drives the need for specialized algorithms.
2. Traditional Collaborative Filtering
Item‑Based CF
The classic item‑based CF computes similarity between items using the column vectors of R. Cosine similarity is a common choice:
For a target user u, the predicted score for item j is a weighted sum of the user's known ratings on the N(u) set of items most similar to j:
Steps:
Compute pairwise item similarity using cosine similarity.
Aggregate scores for each user based on the similarity‑weighted sum of the items they liked.
Rank items by the computed scores and return the top‑K.
Limitations of Traditional CF
Statistical method without a learning objective; no optimization of a loss function.
Uses only local similarity information, ignoring global data patterns.
High memory consumption when user or item dimensions are large.
3. Matrix‑Factorization (Latent‑Factor) CF
To address the above drawbacks, researchers introduced generalized CF based on latent vectors. The classic Latent Factor Model (LFM) factorizes the sparse rating matrix R into two dense matrices P (user factors) and Q (item factors):
The typical loss is Mean Squared Error (MSE):
Most production recommender systems now rely on such latent‑factor models.
Explicit vs. Implicit Feedback
Explicit feedback consists of direct user ratings (e.g., IMDb scores). Implicit feedback derives from user behavior such as clicks, likes, or watch time, which is far more abundant in real‑world applications. Implicit feedback requires negative sampling because the absence of interaction does not imply dislike; a common practice is to pair each positive instance with five randomly sampled negatives.
4. Loss Functions for Implicit Feedback
Regression‑Based (MSE) Loss
For explicit feedback, MSE fits all observed ratings. The optimization typically uses Alternating Least Squares (ALS) to iteratively solve for P and Q.
Cross‑Entropy Loss
Implicit feedback can be cast as a binary classification problem: observed user‑item pairs are labeled 1, unobserved pairs 0. The cross‑entropy loss is then minimized, usually with Stochastic Gradient Descent (SGD).
BPR (Bayesian Personalized Ranking) Loss
BPR is a pair‑wise ranking loss that maximizes the score difference between a positive item and sampled negatives for each user. It directly optimizes ranking quality rather than pointwise error.
5. Neural Network‑Based Collaborative Filtering
Negative Sampling
Deep learning CF models use implicit feedback and therefore require negative sampling. For each positive interaction, five negatives are typically sampled to balance training efficiency and over‑fitting risk.
Optimizer Choice
Both cross‑entropy and BPR loss benefit from adaptive optimizers. Adam is often preferred for its adaptive learning rate and momentum, offering a good trade‑off between convergence speed and model performance.
Generalized Matrix Factorization (GMF)
GMF learns a linear interaction between user and item embeddings:
def model_fn():
uid = tf.placeholder(tf.int32, shape=[None])
item = tf.placeholder(tf.int32, shape=[None])
y = tf.placeholder(tf.float32, shape=[None])
user_emb = tf.Variable(tf.truncated_normal([USER_COUNT+1, EMB_SIZE], stddev=0.5), name="user_emb")
item_emb = tf.Variable(tf.truncated_normal([MOVIE_COUNT+1, EMB_SIZE], stddev=0.5), name="movie_emb")
user_vec = tf.nn.embedding_lookup(user_emb, uid)
item_vec = tf.nn.embedding_lookup(item_emb, item)
layer = tf.multiply(user_vec, item_vec)
logits = tf.keras.layers.Dense(1, kernel_initializer=tf.random_normal_initializer(stddev=0.5), kernel_regularizer=tf.keras.regularizers.l2(L2_LAMBDA))(layer)
logits = tf.reshape(logits, (-1,))
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.minimum(y,1.0), logits=logits)) + tf.losses.get_regularization_loss()
return {"score": tf.sigmoid(logits), "loss": loss, "uid": uid, "i": item, "y": y}Multi‑Layer Perceptron (MLP) Model
MLP captures high‑order non‑linear interactions by concatenating user and item embeddings and feeding them through several dense layers.
def model_fn():
uid = tf.placeholder(tf.int32, shape=[None])
item = tf.placeholder(tf.int32, shape=[None])
y = tf.placeholder(tf.float32, shape=[None])
keep_prob = tf.placeholder(tf.float32)
user_emb = tf.Variable(tf.truncated_normal([USER_COUNT+1, EMB_SIZE], stddev=0.01))
item_emb = tf.Variable(tf.truncated_normal([MOVIE_COUNT+1, EMB_SIZE], stddev=0.01))
user_vec = tf.nn.embedding_lookup(user_emb, uid)
item_vec = tf.nn.embedding_lookup(item_emb, item)
input = tf.concat([user_vec, item_vec], axis=1)
output_size = EMB_SIZE * 2
layers = []
for i in range(2):
output_size = output_size // 2
l = tf.keras.layers.Dense(output_size, activation=tf.nn.relu,
kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),
kernel_regularizer=tf.keras.regularizers.l2(L2_RATE),
bias_regularizer=tf.keras.regularizers.l2(L2_RATE))
layers.append(l)
for l in layers:
input = tf.nn.dropout(l(input), keep_prob)
logits = tf.keras.layers.Dense(1, kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),
kernel_regularizer=tf.keras.regularizers.l2(L2_RATE),
bias_regularizer=tf.keras.regularizers.l2(L2_RATE))(input)
logits = tf.reshape(logits, (-1,))
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)) + tf.losses.get_regularization_loss()
return {"score": tf.sigmoid(logits), "loss": loss, "uid": uid, "item": item, "y": y}NeuMF (Neural Matrix Factorization)
NeuMF fuses GMF and MLP by concatenating their final hidden representations and feeding them to a final dense layer. Pre‑training of GMF and MLP embeddings is common.
def model_fn():
uid = tf.placeholder(tf.int32, shape=[None])
item = tf.placeholder(tf.int32, shape=[None])
y = tf.placeholder(tf.float32, shape=[None])
dropout_rate = tf.placeholder(tf.float32)
mf_user_emb = tf.get_variable('mf_user_emb', initializer=tf.truncated_normal_initializer(stddev=0.1), shape=[USER_COUNT+1, MF_EMB_SIZE])
mf_item_emb = tf.get_variable('mf_item_emb', initializer=tf.truncated_normal_initializer(stddev=0.1), shape=[MOVIE_COUNT+1, MF_EMB_SIZE])
mlp_user_emb = tf.get_variable('mlp_user_emb', initializer=tf.truncated_normal_initializer(stddev=0.1), shape=[USER_COUNT+1, MLP_EMB_SIZE])
mlp_item_emb = tf.get_variable('mlp_item_emb', initializer=tf.truncated_normal_initializer(stddev=0.1), shape=[MOVIE_COUNT+1, MLP_EMB_SIZE])
user_mf = tf.nn.embedding_lookup(mf_user_emb, uid)
item_mf = tf.nn.embedding_lookup(mf_item_emb, item)
mf_layer = tf.multiply(user_mf, item_mf)
user_mlp = tf.nn.embedding_lookup(mlp_user_emb, uid)
item_mlp = tf.nn.embedding_lookup(mlp_item_emb, item)
input = tf.concat([mlp_user, mlp_item], axis=1)
for i in range(2):
input = tf.keras.layers.Dense(size, activation=tf.nn.relu,
kernel_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
bias_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
kernel_initializer=tf.contrib.layers.xavier_initializer())(input)
input = tf.nn.dropout(input, 1 - dropout_rate)
size = size // 2
concat = tf.concat([mf_layer, input], axis=1)
logits = tf.layers.Dense(1, kernel_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
kernel_initializer=tf.contrib.layers.xavier_initializer(),
bias_regularizer=tf.keras.regularizers.l2(L2_LAMBDA))(concat)
logits = tf.reshape(logits, (-1,))
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)) + tf.losses.get_regularization_loss()
return {"score": tf.sigmoid(logits), "loss": loss, "uid": uid, "i": item, "y": y, "dropout_rate": dropout_rate}Deep Matrix Factorization (DMF)
DMF feeds the raw rating matrix rows (users) and columns (items) directly into two separate MLPs, producing dense representations P_i and Q_j. Cosine similarity between these vectors yields the predicted score. The loss is a normalized cross‑entropy that weights samples according to the original rating magnitude.
def model_fn(embs):
uid = tf.placeholder(tf.int32, shape=[None])
item = tf.placeholder(tf.int32, shape=[None])
y = tf.placeholder(tf.float32, shape=[None])
dropout_rate = tf.placeholder(tf.float32)
user_emb = tf.convert_to_tensor(embs)
item_emb = tf.transpose(embs)
user_vec = tf.nn.embedding_lookup(user_emb, uid)
item_vec = tf.nn.embedding_lookup(item_emb, item)
for i, u_size in enumerate(USER_LAYER):
i_size = ITEM_LAYER[i]
act = tf.nn.relu if i > 0 else None
use_bias = i > 0
user_vec = tf.keras.layers.Dense(u_size, activation=act, use_bias=use_bias,
kernel_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
bias_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
kernel_initializer=tf.truncated_normal_initializer(stddev=0.01))(user_vec)
item_vec = tf.keras.layers.Dense(i_size, activation=act, use_bias=use_bias,
kernel_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
bias_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
kernel_initializer=tf.truncated_normal_initializer(stddev=0.01))(item_vec)
dot = tf.reduce_sum(tf.multiply(user_vec, item_vec), axis=1)
norm_u = tf.sqrt(tf.reduce_sum(tf.square(user_vec), axis=1))
norm_i = tf.sqrt(tf.reduce_sum(tf.square(item_vec), axis=1))
y_ = tf.maximum(MIU, dot / (norm_u * norm_i))
norm_rate = y / 5.0
loss = tf.reduce_sum(norm_rate * tf.log(y_) + (1.0 - norm_rate) * tf.log(1.0 - y_))
loss = -loss # negative log‑likelihood
return {"score": y_, "loss": loss, "uid": uid, "i": item, "y": y}Convolutional Matrix Factorization (ConvMF)
ConvMF achieves state‑of‑the‑art performance by representing the outer product of user and item embeddings as a 2‑D interaction matrix and applying a series of convolutional layers with stride‑2 pooling. The final representation is fed to a BPR loss.
def model_fn():
uid = tf.placeholder(tf.int32, shape=[None])
item_i = tf.placeholder(tf.int32, shape=[None])
item_j = tf.placeholder(tf.int32, shape=[None])
keep_prob = tf.placeholder(tf.float32)
user_emb = tf.Variable(tf.truncated_normal([USER_COUNT+1, EMB_SIZE], stddev=0.01))
item_emb = tf.Variable(tf.truncated_normal([MOVIE_COUNT+1, EMB_SIZE], stddev=0.01))
user_vec = tf.nn.embedding_lookup(user_emb, uid)
i_vec = tf.nn.embedding_lookup(item_emb, item_i)
j_vec = tf.nn.embedding_lookup(item_emb, item_j)
w = tf.get_variable('w_', shape=[FILTER_COUNT], initializer=tf.truncated_normal_initializer(stddev=0.1), regularizer=tf.keras.regularizers.l2(L2_W))
b = tf.get_variable('b_', shape=[1], initializer=tf.random_normal_initializer(stddev=0.1), regularizer=tf.keras.regularizers.l2(L2_W))
conv_layers = []
for _ in range(6):
conv_layers.append(tf.keras.layers.Conv2D(filters=FILTER_COUNT, kernel_size=(2,2), strides=(2,2), padding='SAME', activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.l2(L2_CONV), bias_regularizer=tf.keras.regularizers.l2(L2_CONV)))
def conv(user_vec, item_vec):
user_reshape = tf.reshape(user_vec, [-1, EMB_SIZE, 1])
item_reshape = tf.reshape(item_vec, [-1, 1, EMB_SIZE])
matrix = tf.expand_dims(tf.matmul(user_reshape, item_reshape), -1)
x = matrix
for l in conv_layers:
x = tf.nn.dropout(l(x), keep_prob)
x = tf.reshape(x, [-1, FILTER_COUNT])
return tf.reduce_sum(tf.multiply(x, w), axis=1) + b
x_i = conv(user_vec, i_vec)
x_j = conv(user_vec, j_vec)
loss = tf.reduce_sum(tf.log(tf.sigmoid(x_i - x_j))) * -1 + tf.losses.get_regularization_loss()
return {"score": x_i, "loss": loss, "uid": uid, "i": item_i, "j": item_j, "user_vec": user_vec}6. Offline Evaluation Methods
Traditional MSE evaluation does not reflect ranking quality. Modern recommender systems use ranking‑aware metrics such as NDCG (Normalized Discounted Cumulative Gain) and Hit Ratio (HR). The leave‑one‑out protocol is common:
Select the most recent positive interaction of each user as the test item.
Sample 100 negative items that the user has never interacted with.
Score all 101 items with the model.
Rank them (e.g., using a heap) and compute NDCG@K and HR@K.
Average the metrics over all users.
Example TensorFlow evaluation code:
def evaluate(user_data, sess, model):
ndcg = 0.0
hr = 0.0
for u in user_data:
to_test = user_data[u]['scores']
true_id = user_data[u]['true_id']
uid, mid, y = zip(*to_test)
feed = {model['uid']: uid, model['i']: mid, model['keep_probe']: 1.0}
score, user_emb = sess.run([model['score'], model['user_vec']], feed_dict=feed)
predict = []
for i, row in enumerate(to_test):
predict.append((row[0], row[1], score[i]))
ranklist = heapq.nlargest(TOP_K, predict, key=lambda r: r[2])
u_ndcg = 0.0
u_hr = 0.0
for i, row in enumerate(ranklist):
if row[1] == true_id:
u_ndcg = 1 / math.log2(2 + i)
u_hr = 1
ndcg += u_ndcg
hr += u_hr
ndcg /= len(user_data)
hr /= len(user_data)
print(f"{datetime.datetime.now().isoformat()} ------------------- evaluate ndcg(10)={ndcg:.5f}, hr(10)={hr:.5f}")7. Practical Deployment at Huajiao
Huajiao’s recommendation pipeline follows a classic two‑stage architecture: a recall stage that generates a candidate set of items, followed by a ranking stage that orders the candidates. The recall stage primarily uses the NeuMF model with BPR loss, trained on a large proprietary dataset that far exceeds the size of MovieLens‑1M.
Data preprocessing and candidate generation are performed with Apache Spark on HDFS. Model training leverages a distributed TensorFlow platform (HBOX) on a private cloud to handle the scale and multi‑GPU requirements.
8. Conclusion
The article reviewed traditional similarity‑based CF, introduced matrix‑factorization techniques for both explicit and implicit feedback, compared several loss functions (MSE, cross‑entropy, BPR), and detailed a family of deep neural CF models (GMF, MLP, NeuMF, DMF, ConvMF). For each model we provided TensorFlow implementation snippets, discussed negative sampling, optimizer choices, and offline evaluation protocols, offering a practical foundation for researchers and engineers to experiment with modern recommender‑system algorithms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Huajiao Technology
The Huajiao Technology channel shares the latest Huajiao app tech on an irregular basis, offering a learning and exchange platform for tech enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
