Artificial Intelligence 11 min read

Client‑Side Image Similarity Computation: Methods, Experiments, and Findings

This study compares hash‑based, CNN‑based, and local‑feature methods for client‑side image similarity detection in e‑commerce, showing that while hash methods are fast and CNNs are accurate but costly, the Hessian‑Affine detector combined with SIFT descriptors delivers the optimal balance of computational efficiency, robustness to transformations, and high recall/precision for on‑device duplicate filtering.

Xianyu Technology

Apr 26, 2018

Client‑Side Image Similarity Computation: Methods, Experiments, and Findings

In e‑commerce scenarios, duplicate product listings waste storage and compute resources. Performing image similarity detection on the client before uploading can filter repeated images and videos, saving bandwidth and cloud storage.

This work evaluates three categories of image‑similarity techniques—perceptual hash algorithms, local invariant‑feature matching, and convolutional‑neural‑network (CNN) based methods. By comparing computational complexity and retrieval efficiency, the study selects Hessian‑Affine feature detection combined with SIFT descriptors as the preferred on‑device solution.

Traditional hash methods (average hash, difference hash, perceptual hash) generate compact binary fingerprints; similarity is measured by Hamming distance. Experiments show that while fast, these hashes provide insufficient accuracy for practical duplicate‑detection tasks.

CNN‑based end‑to‑end similarity (e.g., using a VGG16 backbone) extracts deep features and feeds them to a fully‑connected matcher, achieving high accuracy but incurring prohibitive compute cost for large‑scale video retrieval. An alternative is to use intermediate feature maps (e.g., VGG16 block5_pool) and compute Euclidean distances, offering a trade‑off between accuracy and speed.

Local invariant‑feature approaches such as SIFT and Hessian‑Affine are examined. SIFT demonstrates strong robustness to cropping, subtitle overlay, and brightness changes, but struggles with rotation. Introducing a minimum key‑point threshold (e.g., 30 points) markedly improves precision by discarding low‑confidence queries.

Video‑frame experiments (10 k videos, 10 frames each) reveal that Hessian‑Affine + SIFT raises both recall and precision while reducing the exclusion rate to about 4 %, outperforming plain SIFT.

Conclusion: For on‑device image similarity, Hessian‑Affine detection paired with SIFT descriptors offers the best balance of computational cost, retrieval efficiency, and robustness. Future work may integrate local features with deep semantic features to further enhance performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN feature extraction image similarity hash algorithms Mobile Computing SIFT

Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.