How Cosine Similarity Powers Movie Recommendations: A Python Guide

This tutorial explains various similarity metrics such as cosine similarity, Euclidean distance, Jaccard index, and Pearson correlation, demonstrates a Python function to compute user interest similarity, and shows how to generate movie recommendations with example code and output.

21CTO
21CTO
21CTO
How Cosine Similarity Powers Movie Recommendations: A Python Guide

After converting a dataset, we can use similarity metrics to find movies similar to those a user has watched. Common pure and hybrid metrics include cosine similarity, Euclidean distance, Jaccard index, and Pearson correlation.

Cosine similarity

Euclidean distance

Jaccard index

Pearson correlation

4.1 Cosine Similarity

Cosine similarity (also called cosine similarity) evaluates similarity by calculating the cosine of the angle between two vectors, typically visualized in a 2‑dimensional space.

It is widely used in machine learning for measuring similarity between users or items. The mathematical formula is shown below.

Cosine similarity formula
Cosine similarity formula

The formula can be interpreted as the sum of the products of user A and user B's ratings for each movie, divided by the product of the square roots of the sum of squares of each user's ratings.

4.2 Pearson Correlation

Pearson correlation yields results very similar to cosine similarity; detailed explanations can be found on Wikipedia.

Pearson correlation illustration
Pearson correlation illustration

We now have a function that computes user interest similarity using cosine similarity, which forms the core of our recommendation system.

def cos_similarity(people,movie1,movie2):
    si={}
    for item in people[movie1]:
        if item in people[movie2]:
            si[item]=1
    if len(si)==0:
        return 0
    sum1=0
    sum21=0
    sum22=0
    for item in si:
        sum1+=(people[movie1][item]*people[movie2][item])
        sum21+=pow(people[movie1][item],2)
        sum22+=pow(people[movie2][item],2)
    if sum21==0 or sum22==0:
        return 0
    return round(sum1/(sqrt(sum21)*sqrt(sum22)),2)

5 Output

First, we need a collection of movies that have been watched:

movies_watched=["You, Me and Dupree","Catch Me If You Can","Snitch"]

The system learns from this data and outputs recommended movies with similarity scores, for example:

------------------------------
| You, Me and Dupree          |
-------------------------------
Catch Me If You Can 0.97
Just My Luck 0.85
Lady in the Water 0.96
Snakes on a Plane 0.97
Snitch 1.0
Superman Returns 0.98
The Night Listener 0.96
------------------------------
| Catch Me If You Can        |
------------------------------
Just My Luck 1.0
Lady in the Water 0.98
Snakes on a Plane 0.99
Snitch 1.0
Superman Returns 1.0
The Night Listener 0.92
You, Me and Dupree 0.97
------------------------------
| Snitch                     |
------------------------------
Catch Me If You Can 1.0
Just My Luck 1.0
Lady in the Water 0.91
Snakes on a Plane 0.99
Superman Returns 0.99
The Night Listener 0.88
You, Me and Dupree 1.0
------------------------------

By setting a similarity threshold (e.g., 0.98), only movies exceeding the threshold are displayed, yielding a more concise recommendation list.

------------------------------
| You, Me and Dupree          |
-------------------------------
Snitch 1.0
Superman Returns 0.98
------------------------------
| Catch Me If You Can        |
------------------------------
Just My Luck 1.0
Lady in the Water 0.98
Snakes on a Plane 0.99
Snitch 1.0
Superman Returns 1.0
------------------------------
| Snitch                     |
------------------------------
Catch Me If You Can 1.0
Just My Luck 1.0
Snakes on a Plane 0.99
Superman Returns 0.99
You, Me and Dupree 1.0
------------------------------

The complete code is available on GitHub at https://github.com/Mitko06/Recommender-System.

Conclusion

We have covered the fundamentals of building a recommendation system, focusing on similarity metrics such as cosine similarity, Euclidean distance, and Pearson correlation. While real‑world systems are more complex, these basics form the backbone of most recommender engines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

recommendation systemcosine similaritysimilarity metrics
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.