Artificial Intelligence 21 min read

Deep Collaborative Filtering Models and Their Implementation in Recommender Systems

This article surveys traditional and deep learning based collaborative filtering techniques—including similarity methods, matrix factorization, explicit and implicit feedback handling, various loss functions, evaluation metrics, and TensorFlow implementations of GMF, MLP, NeuMF, DMF, and ConvMF models—providing practical guidance for building large‑scale recommender systems.

360 Tech Engineering

Aug 28, 2019

Deep Collaborative Filtering Models and Their Implementation in Recommender Systems

Collaborative filtering (CF) is a cornerstone of recommendation systems, using a rating matrix R where each entry r ij represents user i's preference for item j.

Traditional CF includes user‑based and item‑based methods that compute item similarity via cosine similarity and predict scores by aggregating preferences of similar items.

Latent‑factor approaches replace statistical similarity with learned embeddings; matrix factorization (MF) decomposes R into dense user and item matrices and optimizes mean‑squared error (MSE) using alternating least squares.

Explicit feedback (direct ratings) and implicit feedback (clicks, views) are distinguished; implicit data requires negative sampling and often uses weighted loss functions.

Loss functions: MSE for explicit MF, cross‑entropy for binary implicit feedback, and Bayesian Personalized Ranking (BPR) for pair‑wise ranking, each solved with stochastic gradient descent (SGD) or Adam.

Offline evaluation employs leave‑one‑out testing with metrics such as normalized discounted cumulative gain (nDCG) and hit ratio (HR), selecting the top‑K items per user.

TensorFlow implementation of the evaluation procedure:

def evaluate(user_data, sess, model):
    ndcg = 0.0
    hr = 0.0
    for u in user_data:
        to_test = user_data[u]['scores']
        true_id = user_data[u]['true_id']
        uid, mid, y = zip(*to_test)
        feed_dic = {model['uid']: uid, model['i']: mid, model['keep_probe']: 1.0}
        score, user_emb = sess.run([model['score'], model['user_vec']], feed_dict=feed_dic)
        ranklist = heapq.nlargest(TOP_K, [(row[0], row[1], score[i]) for i, row in enumerate(to_test)], key=lambda r: r[2])
        u_ndcg = u_hr = 0.0
        for i, row in enumerate(ranklist):
            if row[1] == true_id:
                u_ndcg = 1 / math.log2(2 + i)
                u_hr += 1
        ndcg += u_ndcg
        hr += u_hr
    ndcg /= len(user_data)
    hr /= len(user_data)
    print("%s ------------------- evaluate ndcg(10)=%.5f, hr(10)=%.5f" % (datetime.datetime.now().isoformat(), ndcg, hr))

Model families:

Generalized Matrix Factorization (GMF) – linear interaction of user and item embeddings.

Multi‑Layer Perceptron (MLP) – concatenated embeddings processed by deep neural layers.

Neural Matrix Factorization (NeuMF) – combines GMF and MLP, often pretrained and fine‑tuned with Adam.

Deep Matrix Factorization (DMF) – feeds raw rating vectors into parallel MLPs and computes cosine similarity.

Convolutional Matrix Factorization (ConvMF) – applies outer‑product interaction maps and CNN layers, trained with BPR loss.

All models are implemented in TensorFlow; example GMF definition:

def model_fn():
    uid = tf.placeholder(tf.int32, shape=[None])
    item = tf.placeholder(tf.int32, shape=[None])
    y = tf.placeholder(tf.float32, shape=[None])
    user_emb = tf.Variable(tf.truncated_normal([USER_COUNT+1, EMB_SIZE], stddev=0.5), name="user_emb")
    item_emb = tf.Variable(tf.truncated_normal([MOVIE_COUNT+1, EMB_SIZE], stddev=0.5), name="movie_emb")
    user_vec = tf.nn.embedding_lookup(user_emb, uid)
    item_vec = tf.nn.embedding_lookup(item_emb, item)
    layer = tf.multiply(user_vec, item_vec)
    logits = tf.keras.layers.Dense(1, kernel_initializer=tf.random_normal_initializer(stddev=0.5),
                                  kernel_regularizer=tf.keras.regularizers.l2(L2_LAMBDA),
                                  bias_initializer=tf.random_normal_initializer(stddev=0.5),
                                  bias_regularizer=tf.keras.regularizers.l2(L2_LAMBDA))(layer)
    logits = tf.reshape(logits, (-1,))
    loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.minimum(y, 1.0), logits=logits)) + tf.losses.get_regularization_loss()
    score = tf.sigmoid(logits)
    return {"score": score, "loss": loss, "uid": uid, "i": item, "y": y}

In production, Huajiao’s recommender uses NeuMF with BPR loss for recall, Spark for data preprocessing, HDFS for storage, and the HBOX distributed training platform for large‑scale TensorFlow training.

The article concludes with a summary of the evolution from similarity‑based CF to deep latent‑factor models and provides references for further reading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Evaluation Metrics TensorFlow collaborative filtering matrix factorization recommender systems

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.