Step-by-Step Guide to Building a Movie Recommendation System with TensorFlow

This tutorial walks through collecting and cleaning the MovieLens dataset, constructing rating and record matrices, normalizing ratings, defining a collaborative‑filtering model in TensorFlow, training it with Adam optimizer, evaluating performance, and finally generating personalized movie recommendations for a chosen user.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Step-by-Step Guide to Building a Movie Recommendation System with TensorFlow

Step 1: Collect and Clean Data

Download the MovieLens dataset (ml‑latest‑small) and load the ratings.csv and movies.csv files using pandas.

import pandas as pd
import numpy as np
import tensorflow as tf
ratings_df = pd.read_csv('./ml-latest-small/ratings.csv')
movies_df = pd.read_csv('./ml-latest-small/movies.csv')

Generate a new column movieRow as the row index, keep only the columns movieRow, movieId and title, and save the processed movies file.

movies_df['movieRow'] = movies_df.index
movies_df = movies_df[['movieRow','movieId','title']]
movies_df.to_csv('./ml-latest-small/moviesProcessed.csv', index=False, header=True, encoding='utf-8')

Merge the ratings with the processed movies on movieId and keep only userId, movieRow and rating. Save the processed ratings.

ratings_df = pd.merge(ratings_df, movies_df, on='movieId')
ratings_df = ratings_df[['userId','movieRow','rating']]
ratings_df.to_csv('./ml-latest-small/ratingsProcessed.csv', index=False, header=True, encoding='utf-8')

Step 2: Create Rating and Record Matrices

userNo = ratings_df['userId'].max() + 1
movieNo = ratings_df['movieRow'].max() + 1
rating = np.zeros((movieNo, userNo))
for index, row in ratings_df.iterrows():
    rating[int(row['movieRow']), int(row['userId'])] = row['rating']
record = (rating > 0).astype(int)

Step 3: Build the Model

Normalize ratings by subtracting each movie’s mean rating.

def normalizeRatings(rating, record):
    m, n = rating.shape
    rating_mean = np.zeros((m, 1))
    rating_norm = np.zeros((m, n))
    for i in range(m):
        idx = record[i, :] != 0
        rating_mean[i] = np.mean(rating[i, idx])
        rating_norm[i, idx] = rating[i, idx] - rating_mean[i]
    return rating_norm, rating_mean

rating_norm, rating_mean = normalizeRatings(rating, record)
rating_norm = np.nan_to_num(rating_norm)
rating_mean = np.nan_to_num(rating_mean)

Define model parameters and loss function (regularized squared error).

num_features = 10
X_parameters = tf.Variable(tf.random_normal([movieNo, num_features], stddev=0.35))
Theta_parameters = tf.Variable(tf.random_normal([userNo, num_features], stddev=0.35))
loss = 0.5 * tf.reduce_sum(((tf.matmul(X_parameters, Theta_parameters, transpose_b=True) - rating_norm) * record) ** 2) \
       + 0.5 * (tf.reduce_sum(X_parameters ** 2) + tf.reduce_sum(Theta_parameters ** 2))

Use the Adam optimizer to minimize the loss.

optimizer = tf.train.AdamOptimizer(1e-4)
train = optimizer.minimize(loss)

Step 4: Train the Model

tf.summary.scalar('loss', loss)
summaryMerged = tf.summary.merge_all()
writer = tf.summary.FileWriter('./movie_tensorboard')
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(5000):
    _, movie_summary = sess.run([train, summaryMerged])
    writer.add_summary(movie_summary, i)

Step 5: Evaluate the Model

Current_X_parameters, Current_Theta_parameters = sess.run([X_parameters, Theta_parameters])
predicts = np.dot(Current_X_parameters, Current_Theta_parameters.T) + rating_mean
errors = np.sqrt(np.sum((predicts - rating) ** 2))

The root‑mean‑square error obtained is approximately 4037.90.

Step 6: Build the Complete Recommendation System

user_id = input('Enter the user ID you want to recommend for: ')
sortedResult = predicts[:, int(user_id)].argsort()[::-1]
idx = 0
print('===== Top 20 recommended movies for this user ====='.center(80, '='))
for i in sortedResult:
    print('Score: %.2f, Movie: %s' % (predicts[i, int(user_id)], movies_df.iloc[i]['title']))
    idx += 1
    if idx == 20:
        break

The script outputs the 20 movies with the highest predicted scores for the specified user.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningTensorFlowcollaborative filteringdata preprocessingmovie recommendation
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.