Artificial Intelligence 6 min read

Overview of Image Search System

This article explains the fundamentals of building an image‑by‑image search system, covering image feature extraction methods such as hashing, traditional descriptors, CNN‑based vectors, and the use of vector search engines like Milvus for similarity retrieval.

System Architect Go
System Architect Go
System Architect Go
Overview of Image Search System

Overview of Image Search System

Image search by example refers to retrieving images that are visually similar to a given query image based on its content.

Building an image‑search system requires solving two critical problems: first, extracting meaningful image features; second, providing a feature‑data search engine that stores those features in a database and supports similarity queries.

Image Feature Representation

Three main approaches are introduced.

Image Hash

An image hash is a set of hash values obtained after a series of transformations and processing steps; the transformations constitute the hash algorithm.

The hash value serves as an overall abstract representation of the image.

For example, the Average Hash algorithm works as follows:

Steps:

1. Reduce size: resize the original image to 8 × 8 (64 pixels), discarding fine details.

2. Reduce color: convert to grayscale, obtaining a 64‑level gray image.

3. Average the colors: compute the mean of the 64 gray values.

4. Compute the bits: binarize each pixel by comparing it with the mean, assigning 0 or 1.

5. Construct the hash: arrange the 64 bits into a 64‑bit integer (e.g., left‑to‑right, top‑to‑bottom), which is the final average hash value.

Reference: http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html

Common image‑hash algorithms include:

AverageHash : also called Different Hash

PHash : Perceptual Hash

MarrHildrethHash : Marr‑Hildreth Operator Based Hash

RadialVarianceHash : based on Radon transform

BlockMeanHash : based on block mean

ColorMomentHash : based on color moments

In practice, the most frequently used hash is PHash .

Image hashes can tolerate moderate watermarks, compression, and noise; similarity can be judged by the Hamming distance between hash values.

However, because a hash abstracts the whole image, it is sensitive to global changes—adding a black border, for instance, can break similarity detection.

Traditional Features

Early computer‑vision research produced classic feature algorithms such as SIFT :

The SIFT algorithm extracts a set of local keypoints, each represented by a high‑dimensional vector. To compute similarity, these local descriptors are typically aggregated into a single global feature vector using methods such as:

BOW (Bag of Words)

Fisher vector

VLAD

CNN Features

Since the rise of artificial intelligence, convolutional neural networks (CNNs) have become the dominant way to extract image features.

Features extracted by a CNN are also high‑dimensional vectors; for example, using the VGG16 model to obtain features (see https://keras.io/applications/#extract-features-with-vgg16 ).

Search Engine

Because images are represented as feature vectors, the search engine essentially performs vector retrieval.

A practical open‑source solution is Milvus , which can be quickly integrated into projects; refer to its official documentation for detailed usage.

CNNMilvusVector SearchFeature ExtractionImage Searchimage hashing
System Architect Go
Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.