Step-by-Step Guide to Building a Movie Recommendation System with TensorFlow
This tutorial walks through collecting and cleaning the MovieLens dataset, constructing rating and record matrices, normalizing ratings, defining a collaborative‑filtering model in TensorFlow, training it with Adam optimizer, evaluating performance, and finally generating personalized movie recommendations for a chosen user.
Step 1: Collect and Clean Data
Download the MovieLens dataset (ml‑latest‑small) and load the ratings.csv and movies.csv files using pandas.
import pandas as pd
import numpy as np
import tensorflow as tf
ratings_df = pd.read_csv('./ml-latest-small/ratings.csv')
movies_df = pd.read_csv('./ml-latest-small/movies.csv')Generate a new column movieRow as the row index, keep only the columns movieRow, movieId and title, and save the processed movies file.
movies_df['movieRow'] = movies_df.index
movies_df = movies_df[['movieRow','movieId','title']]
movies_df.to_csv('./ml-latest-small/moviesProcessed.csv', index=False, header=True, encoding='utf-8')Merge the ratings with the processed movies on movieId and keep only userId, movieRow and rating. Save the processed ratings.
ratings_df = pd.merge(ratings_df, movies_df, on='movieId')
ratings_df = ratings_df[['userId','movieRow','rating']]
ratings_df.to_csv('./ml-latest-small/ratingsProcessed.csv', index=False, header=True, encoding='utf-8')Step 2: Create Rating and Record Matrices
userNo = ratings_df['userId'].max() + 1
movieNo = ratings_df['movieRow'].max() + 1
rating = np.zeros((movieNo, userNo))
for index, row in ratings_df.iterrows():
rating[int(row['movieRow']), int(row['userId'])] = row['rating']
record = (rating > 0).astype(int)Step 3: Build the Model
Normalize ratings by subtracting each movie’s mean rating.
def normalizeRatings(rating, record):
m, n = rating.shape
rating_mean = np.zeros((m, 1))
rating_norm = np.zeros((m, n))
for i in range(m):
idx = record[i, :] != 0
rating_mean[i] = np.mean(rating[i, idx])
rating_norm[i, idx] = rating[i, idx] - rating_mean[i]
return rating_norm, rating_mean
rating_norm, rating_mean = normalizeRatings(rating, record)
rating_norm = np.nan_to_num(rating_norm)
rating_mean = np.nan_to_num(rating_mean)Define model parameters and loss function (regularized squared error).
num_features = 10
X_parameters = tf.Variable(tf.random_normal([movieNo, num_features], stddev=0.35))
Theta_parameters = tf.Variable(tf.random_normal([userNo, num_features], stddev=0.35))
loss = 0.5 * tf.reduce_sum(((tf.matmul(X_parameters, Theta_parameters, transpose_b=True) - rating_norm) * record) ** 2) \
+ 0.5 * (tf.reduce_sum(X_parameters ** 2) + tf.reduce_sum(Theta_parameters ** 2))Use the Adam optimizer to minimize the loss.
optimizer = tf.train.AdamOptimizer(1e-4)
train = optimizer.minimize(loss)Step 4: Train the Model
tf.summary.scalar('loss', loss)
summaryMerged = tf.summary.merge_all()
writer = tf.summary.FileWriter('./movie_tensorboard')
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(5000):
_, movie_summary = sess.run([train, summaryMerged])
writer.add_summary(movie_summary, i)Step 5: Evaluate the Model
Current_X_parameters, Current_Theta_parameters = sess.run([X_parameters, Theta_parameters])
predicts = np.dot(Current_X_parameters, Current_Theta_parameters.T) + rating_mean
errors = np.sqrt(np.sum((predicts - rating) ** 2))The root‑mean‑square error obtained is approximately 4037.90.
Step 6: Build the Complete Recommendation System
user_id = input('Enter the user ID you want to recommend for: ')
sortedResult = predicts[:, int(user_id)].argsort()[::-1]
idx = 0
print('===== Top 20 recommended movies for this user ====='.center(80, '='))
for i in sortedResult:
print('Score: %.2f, Movie: %s' % (predicts[i, int(user_id)], movies_df.iloc[i]['title']))
idx += 1
if idx == 20:
breakThe script outputs the 20 movies with the highest predicted scores for the specified user.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
