Artificial Intelligence 11 min read

Implementing Perceptual Hash for Image Similarity Matching in Mini Programs

To automate matching of theme‑image URLs in a mini‑program skin‑changing feature, the author adopts perceptual hashing—using algorithms such as aHash, pHash and dHash—to generate compact 64‑bit fingerprints, compare them via Hamming distance, and reliably identify visually similar images despite minor edits or scaling.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Implementing Perceptual Hash for Image Similarity Matching in Mini Programs

In a mini‑program skin‑changing feature, each theme color corresponds to a different image library, and image URLs are configured via online keys. Manually matching new images to their keys is time‑consuming, so the author explored using perceptual hash (pHash) to automate the matching based on visual similarity.

Perceptual Hash

Perceptual hash (also called a fingerprint algorithm) generates a compact representation of multimedia content that reflects visual similarity. Unlike cryptographic hashes (MD5, SHA‑1) which produce random‑looking values, perceptual hashes allow similarity comparison: the smaller the distance between two hashes, the more alike the images.

Difference from Cryptographic Hashes

Cryptographic hashes change drastically with any minor change in the file, making them unsuitable for image similarity because format changes, metadata, or slight visual edits produce completely different hash values. Perceptual hashes, however, capture the distribution of pixel colors, enabling robust similarity detection despite scaling, cropping, brightness adjustments, etc.

Hamming Distance

The similarity between two perceptual hashes is usually measured by the Hamming distance (the number of differing bits). Typical thresholds are:

Hamming distance = 0 → identical Hamming distance < 5 → very similar Hamming distance > 10 → different pictures

Hamming distance is also defined in information theory as the number of differing characters between two equal‑length strings.

Implementation Process

Common perceptual hash algorithms include aHash, pHash, and dHash. The general workflow is: simplify the image → extract pixel values → compute the hash.

Average Hash (aHash)

aHash compares each pixel’s grayscale value to the average grayscale value of the image.

image = image.resize((8, 8), Image.ANTIALIAS) # Reduce its size.

image = image.convert("L") # Convert to grayscale.

pixels = list(image.getdata()) avg = sum(pixels) / len(pixels)

bits = "".join(map(lambda pixel: '1' if pixel < avg else '0', pixels)) # '00010100...' hexadecimal = int(bits, 2).__format__('016x').upper()

The resulting 64‑bit hash (e.g., 00010E3CE08FFFFE ) can be compared with other images using Hamming distance; a distance close to 0 indicates high similarity.

Perceptual Hash (pHash)

pHash applies a Discrete Cosine Transform (DCT) to the image, keeping low‑frequency components that represent the overall structure while discarding high‑frequency details. After resizing the image to 32×32, the top‑left 8×8 DCT coefficients are extracted, compared to their mean, and encoded into a 64‑bit binary hash.

Difference Hash (dHash)

dHash resizes the image to 9×8, compares each pixel with its right‑hand neighbor, and records a 1 if the left pixel is brighter, otherwise 0. The resulting 64‑bit hash is also compared via Hamming distance. dHash is faster than aHash while offering comparable accuracy.

All three algorithms can be implemented with OpenCV or the Python imagehash library, which provides ready‑made functions.

Summary

Perceptual hashing offers a simple yet effective way to compare image similarity for scenarios such as skin changes, minor color adjustments, or scaling. It performs well for slight visual variations but may struggle with heavy cropping, rotation, or added watermarks that alter the color distribution. No single method guarantees 100% accuracy; combining multiple techniques can improve robustness.

References

https://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html https://web.archive.org/web/20171112054354/https://www.safaribooksonline.com/blog/2013/11/26/image-hashing-with-python/ https://blog.csdn.net/cjzjolly/article/details/123524616 https://zhuanlan.zhihu.com/p/68215900 https://www.yumefx.com/?p=3163

Pythonimage similarityaHashdHashHamming distanceperceptual hashpHash
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.