Artificial Intelligence 21 min read

From Text to Images: Building Multimodal Product Search with Elasticsearch Serverless

This article examines the evolution of e‑commerce search from simple keyword matching to multimodal, cross‑modal retrieval, explains the core embedding and vector‑search technologies, compares dense, sparse and hybrid models, and demonstrates how Elasticsearch Serverless and Alibaba Cloud AI Search Platform enable a low‑cost, serverless, high‑performance end‑to‑end multimodal product search solution.

DataFunSummit

Jun 18, 2026

From Text to Images: Building Multimodal Product Search with Elasticsearch Serverless

Introduction

With rapid advances in AI, user expectations for search have moved beyond plain keyword matching toward multimodal and cross‑modal capabilities. Users now want to find products by uploading images or describing complex scenes in natural language, which traditional text‑only search cannot satisfy.

Multimodal Product Search Solution

The solution consists of three layers: data processing, query & recall, and fusion & ranking.

1. Data Processing Layer

Text metadata processing : Structured fields such as title, description, category, and tags are tokenized and indexed in a traditional text engine.

Image processing : Images are fed to a multimodal large model which generates descriptive text. The description is then tokenized and indexed like other text fields.

Embedding (vectorization) : Both the generated text and the raw image are transformed into high‑dimensional vectors (e.g., Vector) and stored in a vector engine.

2. Query & Recall Layer

Text query : The query keyword is matched directly in the text engine (exact match) and also converted to a query vector via embedding for semantic matching in the vector engine.

Image query : The uploaded image is encoded into a query vector, which is used to retrieve visually similar items from the vector engine.

3. Fusion & Ranking Layer

Rerank & Score : Results from the text and vector engines are merged and re‑ranked, combining textual relevance scores with vector similarity scores.

Result Return : The final top‑N items are presented to the user.

Key Technologies

Embedding (Vectorization)

Embedding maps unstructured data (text, images) into machine‑readable vectors. Three model families are discussed:

Dense Model (e.g., Word2vec, S‑BERT, LLM‑based models): Produces dense vectors where most dimensions are non‑zero, capturing deep semantic relationships.

Sparse Model (e.g., BM25, SPLADE): Generates high‑dimensional sparse vectors with only a few non‑zero entries, emphasizing exact term matching.

Hybrid Model : Combines dense and sparse vectors to obtain both semantic generalization and precise keyword matching, delivering superior performance in benchmarks.

Vector Retrieval

Vector retrieval finds the K‑nearest neighbors (KNN) of a query vector based on distance metrics such as Euclidean distance, dot product, or cosine similarity. Elasticsearch now provides native support for vector fields: dense_vector: Stores dense vectors. sparse_vector: Stores sparse vectors efficiently. semantic_text: An abstract type that automatically maps text to the appropriate vector type via a configured inference model.

Additional features include:

Inference API : Calls external AI models (e.g., embedding models) during indexing or querying.

Ingest Pipeline : Uses processors like text_embedding or inference to convert fields to vectors on the fly.

KNN Search : Native approximate nearest neighbor API for efficient vector search.

Hybrid Search : Executes both traditional match queries and KNN vector queries in a single request.

Reciprocal Rank Fusion (RRF) : Merges results from different recall sets based on ranking positions, improving robustness.

Performance Optimization via Quantization

When handling billions of high‑dimensional vectors, memory consumption becomes a bottleneck. Quantization reduces the storage size of vectors:

Scalar Quantization (SQ) : Converts 32‑bit floats to 8‑bit (or 4‑bit) integers, achieving a 4×–8× reduction in memory.

Better Binary Quantization (BBQ) : An advanced technique from Nanyang Technological University that can lower vector memory usage by up to 95% with a controllable loss in recall. Combined with HNSW graph indexing, it enables billion‑scale vector search with dramatically fewer compute nodes.

Elasticsearch Serverless

Elasticsearch Serverless is a fully managed, serverless search service on Alibaba Cloud. Its architecture includes a Serverless Proxy that abstracts cluster details, read‑write separated Elasticsearch clusters, and automated management components.

Core Advantages

Zero Operations : No need to manage clusters, nodes, or shards; users interact with a logical application.

Cost Efficiency : Pay‑as‑you‑go based on Compute Units (CU) with second‑level granularity.

High Elasticity : Automatic scaling of resources and configuration (e.g., replica count) based on real‑time load.

Seamless AI Model Integration : Built‑in AI Search Platform models (e.g., M2‑Encoder, Qwen2‑VL) are callable via the Inference API; custom models can also be integrated.

Vector‑Specific Optimizations : Default exclusion of vector fields from _source, one‑click activation of int8 or BBQ quantization, and automatic pre‑warming of HNSW and quantized index files to reduce cold‑start latency.

End‑to‑End Demo

The demo walks through the complete workflow:

Data resides in an RDS instance (product ID, text description, image URL).

An offline data service extracts records and sends them to the AI Search Platform, where multimodal models generate unified vectors.

Processed text and vectors are ingested into Elasticsearch Serverless, populating both the text and vector indexes.

At query time, a front‑end application submits either a text string or an image; the query is vectorized via the AI Search Platform and sent to Elasticsearch Serverless for multi‑path recall.

Elasticsearch returns the top‑N results after Rerank & Score, which are displayed to the user.

This pipeline showcases how to rapidly prototype a multimodal product search system with minimal operational overhead and scalable performance.

Conclusion

The combination of modern embedding techniques, efficient vector retrieval, quantization methods, and the serverless capabilities of Elasticsearch provides a powerful, low‑cost solution for next‑generation e‑commerce search scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Quantization HNSW Embedding Vector Retrieval multimodal search BBQ AI search platform Elasticsearch Serverless

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.