Client‑Side Image Similarity Computation: Methods, Experiments, and Findings
This study compares hash‑based, CNN‑based, and local‑feature methods for client‑side image similarity detection in e‑commerce, showing that while hash methods are fast and CNNs are accurate but costly, the Hessian‑Affine detector combined with SIFT descriptors delivers the optimal balance of computational efficiency, robustness to transformations, and high recall/precision for on‑device duplicate filtering.
In e‑commerce scenarios, duplicate product listings waste storage and compute resources. Performing image similarity detection on the client before uploading can filter repeated images and videos, saving bandwidth and cloud storage.
This work evaluates three categories of image‑similarity techniques—perceptual hash algorithms, local invariant‑feature matching, and convolutional‑neural‑network (CNN) based methods. By comparing computational complexity and retrieval efficiency, the study selects Hessian‑Affine feature detection combined with SIFT descriptors as the preferred on‑device solution.
Traditional hash methods (average hash, difference hash, perceptual hash) generate compact binary fingerprints; similarity is measured by Hamming distance. Experiments show that while fast, these hashes provide insufficient accuracy for practical duplicate‑detection tasks.
CNN‑based end‑to‑end similarity (e.g., using a VGG16 backbone) extracts deep features and feeds them to a fully‑connected matcher, achieving high accuracy but incurring prohibitive compute cost for large‑scale video retrieval. An alternative is to use intermediate feature maps (e.g., VGG16 block5_pool) and compute Euclidean distances, offering a trade‑off between accuracy and speed.
Local invariant‑feature approaches such as SIFT and Hessian‑Affine are examined. SIFT demonstrates strong robustness to cropping, subtitle overlay, and brightness changes, but struggles with rotation. Introducing a minimum key‑point threshold (e.g., 30 points) markedly improves precision by discarding low‑confidence queries.
Video‑frame experiments (10 k videos, 10 frames each) reveal that Hessian‑Affine + SIFT raises both recall and precision while reducing the exclusion rate to about 4 %, outperforming plain SIFT.
Conclusion: For on‑device image similarity, Hessian‑Affine detection paired with SIFT descriptors offers the best balance of computational cost, retrieval efficiency, and robustness. Future work may integrate local features with deep semantic features to further enhance performance.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.