Artificial Intelligence 20 min read

How to Build a Multimodal Product Search Engine with Embedding and Vector Retrieval on Elasticsearch Serverless

This article explains a complete multimodal product search solution that combines text and image embeddings, dense, sparse, and hybrid models, vector similarity metrics, and Elasticsearch Serverless features such as dense_vector, sparse_vector, hybrid search, quantization, and RRF ranking to achieve fast, accurate, and cost‑effective retrieval.

DataFunSummit

Apr 19, 2026

How to Build a Multimodal Product Search Engine with Embedding and Vector Retrieval on Elasticsearch Serverless

01 Overview of Multimodal Product Search

With the rapid development of AI, users now expect search experiences that go beyond simple keyword matching, allowing queries by images or natural language descriptions of complex scenes. Traditional text search struggles to understand visual elements or capture detailed visual attributes such as color, pattern, or style.

The solution consists of two core technologies: Embedding (vectorization) and vector search . By converting both structured text and unstructured images into high‑dimensional vectors, the system can perform semantic and visual similarity retrieval.

02 Key Technologies

1. Embedding (Vectorization)

Embedding maps unstructured data (text, images) into machine‑readable vectors. Three model families are commonly used:

Dense Model : Generates dense vectors where most dimensions are non‑zero, capturing deep semantic relationships. Typical models include Word2Vec, S‑BERT, and large language model (LLM) based encoders.

Sparse Model : Produces sparse vectors with only a few non‑zero dimensions, emphasizing exact term matching similar to traditional bag‑of‑words. Representative models are BM25 and SPLADE.

Hybrid Model : Combines dense and sparse vectors for each input, leveraging both semantic generalization and precise keyword matching, yielding superior performance across benchmarks.

2. Vector Search

Vector search finds the K nearest neighbors (KNN) of a query vector in a high‑dimensional space. Common similarity measures include:

Euclidean Distance (L2) : Direct geometric distance; often normalized to a 0‑1 score using 1 / (1 + L2_norm^2).

Dot Product : Sum of element‑wise products; equivalent to cosine similarity when vectors are normalized.

Cosine Similarity : Measures the angle between vectors, ranging from -1 to 1, with 1 indicating identical direction.

Elasticsearch now supports vector fields such as dense_vector, sparse_vector, and semantic_text. It provides native KNN APIs, hybrid search (combining match and KNN), and ranking fusion via Reciprocal Rank Fusion (RRF) to balance scores from textual and vector results.

3. Performance Optimization with Quantization

High‑dimensional float32 vectors consume large memory. Quantization reduces storage by converting float32 values to lower‑precision formats:

Scalar Quantization (SQ) : Maps each dimension from 4‑byte float to 1‑byte int8 (or even int4), achieving 4×–8× memory reduction.

BBQ (Better Binary Quantization) : An advanced technique that further compresses vectors, cutting memory usage by up to 95% while maintaining acceptable recall. Combined with HNSW graph indexing, it enables billion‑scale vector retrieval with modest hardware.

03 Best Practices with Elasticsearch Serverless

The end‑to‑end workflow integrates Alibaba Cloud AI Search Open Platform and Elasticsearch Serverless:

Data Source : Product data (ID, text description, images) stored in RDS.

Offline Data Service : Extracts data from RDS.

Multimodal Vector Service : Calls built‑in AI models (e.g., M2‑Encoder, Qwen2‑VL) to generate unified multimodal vectors for text and images.

Data Ingestion : Writes both textual fields and vector fields into Elasticsearch Serverless indices.

Online Query : Front‑end sends a text or image query; the AI Search platform vectorizes the query, which is then sent to Elasticsearch Serverless for combined textual and vector recall, followed by RRF re‑ranking and top‑N result return.

Elasticsearch Serverless offers:

Zero Operations : No cluster management, automatic version upgrades, and built‑in security.

Pay‑as‑You‑Go : Compute Units (CU) billed per second, eliminating over‑provisioning.

Auto‑Scaling : Resources automatically expand or shrink based on load.

Seamless AI Model Integration : Native support for AI Search models via Inference API, plus the ability to plug in custom external models.

Vector Optimizations : Default int8 or BBQ quantization, automatic vector field exclusion from _source, and adaptive pre‑warming of HNSW and quantized indexes to reduce cold‑start latency.

04 Demo

A complete demo showcases the entire pipeline: ingesting product data, generating multimodal embeddings, indexing with Elasticsearch Serverless, and performing real‑time multimodal queries that return visually and semantically relevant products.

Thank you for reading.

serverless AI Quantization Elasticsearch Embedding vector retrieval multimodal search

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.