Artificial Intelligence 20 min read

How Multimodal Product Search Works: Embedding, Vector Retrieval, and Elasticsearch Serverless

This article explores the evolution from keyword to multimodal search, detailing a generic solution architecture, core embedding and vector retrieval technologies, and practical implementation using Alibaba Cloud AI Search Open Platform and Elasticsearch Serverless.

DataFunSummit

Apr 11, 2026

How Multimodal Product Search Works: Embedding, Vector Retrieval, and Elasticsearch Serverless

Multimodal Product Search Architecture

The system processes structured product metadata (titles, descriptions, categories, tags) and unstructured images. Text data is tokenized and indexed in a traditional text engine. Images are passed through a multimodal AI model (e.g., M2‑Encoder, Qwen2‑VL) to generate descriptive captions, which are then tokenized and indexed alongside the metadata.

Embedding Layer

Both text and image data are transformed into high‑dimensional vectors. Dense models (Word2Vec, S‑BERT, LLM‑based) produce dense vectors that capture semantic similarity, while sparse models (BM25, SPLADE) generate high‑dimensional sparse vectors for exact term matching. A hybrid approach can combine dense and sparse representations for optimal recall and precision.

Query & Retrieval Layer

Text query : matched against the text engine (keyword match) and also encoded into a query vector for similarity search in the vector engine.

Image query : the uploaded image is encoded into a vector and used for nearest‑neighbor search.

Results from both paths are re‑ranked and merged using Reciprocal Rank Fusion (RRF) before returning the top‑N items.

Key Technologies

Embedding (Vectorization)

Embedding converts unstructured data into structured vectors. Dense vectors capture deep semantics; sparse vectors preserve exact keyword matches. Hybrid models generate both representations for each document.

Vector Retrieval

Similarity is measured with Euclidean distance, dot product, or cosine similarity. Elasticsearch supports dense_vector, sparse_vector, and semantic_text field types, native K‑Nearest Neighbor (KNN) search, hybrid (text + vector) queries, and quantization.

Quantization for Performance

Scalar Quantization (SQ) maps float32 vectors to int8 or int4, reducing memory usage by up to 75 %. Better Binary Quantization (BBQ) can cut memory by up to 95 % with modest recall loss, enabling billion‑scale vector search.

Implementation on Alibaba Cloud

AI Search Open Platform : extracts product data from RDS, runs multimodal models to generate unified vectors, and provides pipelines to ingest vectors into Elasticsearch.

Elasticsearch Serverless : a fully managed, auto‑scaling Elasticsearch service that supports dense/sparse vectors, native KNN, hybrid search, and quantization (int8, BBQ). It offers zero‑ops scaling, built‑in monitoring, and an Inference API for real‑time model integration.

Data Flow

Extract product records (ID, text, image) from RDS.

Use AI Search Open Platform to generate multimodal embeddings.

Ingest text fields and vector fields into Elasticsearch Serverless indices.

At query time, route user text or image to the same embedding service, obtain a query vector, and perform simultaneous text match and vector KNN search.

Merge results with RRF and return the top‑N hits.

Elasticsearch Vector Features

dense_vector

– stores dense embeddings. sparse_vector – stores high‑dimensional sparse embeddings. semantic_text – automatically maps text to vectors via configured inference models.

Native KNN API for dense_vector fields.

Hybrid search combining match queries with KNN.

RRF fusion for robust ranking across heterogeneous result sets.

Performance Optimizations

Scalar Quantization : reduces float32 (4 bytes) to int8 (1 byte) or int4 (0.5 byte) per dimension.

BBQ : further compresses vectors, achieving up to 95 % memory reduction; recall loss can be mitigated by increasing num_candidates in KNN search.

HNSW graph indexing combined with quantization enables large‑scale retrieval with modest hardware.

Serverless Advantages

Zero operational overhead – no cluster management, automatic version upgrades, and built‑in monitoring.

Pay‑as‑you‑go CU‑based billing with second‑level granularity.

Automatic scaling and resource auto‑tuning based on traffic.

Seamless integration of built‑in AI models via Inference API; custom models can be added through simple API configuration.

Vector‑specific optimizations: source field exclusion for vectors, one‑click int8/BBQ quantization, adaptive pre‑warming of HNSW indexes.

This architecture enables a cost‑effective, high‑performance multimodal product search system that leverages modern embedding techniques, efficient vector retrieval, and serverless cloud infrastructure.

vector retrieval Alibaba Cloud multimodal search Hybrid Search AI quantization Elasticsearch Serverless

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.