How Perceptual Hashing Powers Fast Image Similarity Search
This article explains the simple perceptual hash algorithm behind image similarity search, detailing each processing step, how to compute Hamming distance, and introduces alternative methods like color histograms and Otsu's thresholding for robust visual matching.
In 2011 Google added a "Search by image" button to its homepage, allowing users to upload an image or provide a URL and retrieve visually similar pictures from the web.
One of the core techniques behind such services is the perceptual hash algorithm , which creates a fingerprint string for each image and compares fingerprints to assess similarity.
Step 1 – Resize : Reduce the image to an 8×8 pixel grid (64 pixels total) to discard fine details while preserving overall structure and luminance.
Step 2 – Grayscale : Convert the 8×8 image to 64 levels of gray, limiting each pixel to one of 64 possible shades.
Step 3 – Compute average : Calculate the average gray value of the 64 pixels.
Step 4 – Compare pixels : For each pixel, record a 1 if its gray value is greater than or equal to the average, otherwise record a 0.
Step 5 – Build hash : Concatenate the 64 bits into a 64‑bit integer; this is the image’s fingerprint. The order of bits is irrelevant as long as it is consistent across images.
To compare two images, compute the Hamming distance between their hashes. A distance ≤ 5 indicates strong similarity, while a distance > 10 suggests the images are different.
The algorithm is implemented in a short Python script (≈53 lines) called imgHash.py, which takes a reference image and a directory of candidate images and returns the Hamming distance for each pair.
Advantages: simplicity, speed, and invariance to scaling. Drawbacks: it fails when the image content is altered (e.g., added text). For more robust matching, stronger algorithms such as pHash or SIFT are used, which follow the same basic principle of converting an image to a hash string before comparison.
Other simple methods include:
Color histogram method : Generate a histogram of color distribution (often reduced to 64 bins by grouping RGB values) and compare histograms using Pearson correlation or cosine similarity.
Content feature method : Resize the image to a small gray‑scale version (e.g., 50×50), apply a threshold to obtain a binary image, and use the resulting 0‑1 matrix as a feature vector. The optimal threshold can be found with Otsu’s method , which minimizes intra‑class variance (or equivalently maximizes inter‑class variance). The algorithm evaluates each possible gray level, computes class weights w₁, w₂, class means μ₁, μ₂, and variances σ₁², σ₂², then selects the threshold that yields the smallest w₁σ₁² + w₂σ₂² (or largest w₁w₂(μ₁‑μ₂)²).
Once a binary matrix is obtained, similarity between two images can be measured by counting the number of differing bits using an XOR operation; fewer differing bits mean higher similarity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
