Artificial Intelligence 11 min read

Applying CLIP and Milvus for Image Similarity Search in E‑commerce Risk Control

The article explains how an e‑commerce risk‑control team leverages OpenAI's CLIP model to generate image and text embeddings and stores them in the Milvus cloud‑native vector database to enable fast, scalable similarity searches for compliance verification and risk detection.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Applying CLIP and Milvus for Image Similarity Search in E‑commerce Risk Control

1 Background Introduction

As the risk‑control department of an e‑commerce platform, we need to retrospectively verify the compliance of product images and user avatars. Traditional offline processing with heavy models consumes large compute resources and takes days. By extracting feature vectors with deep models, storing them in a vector database, and searching with query vectors, we can quickly retrieve similar images. The CLIP model provides both text and image embeddings, while Milvus offers scalable storage and retrieval of massive vectors.

2 CLIP Model

2.1 About the CLIP Model

CLIP (Contrastive Language‑Image Pre‑training) is an OpenAI‑developed multimodal model that learns joint representations of images and text, enabling strong performance on both visual and language tasks.

2.2 Applications of CLIP

In ZhiZhuan risk control, CLIP is used to classify images based on textual prompts and to generate feature vectors that are persisted in Milvus for later similarity search.

3 Milvus

3.1 What is Milvus

Milvus is a cloud‑native vector database designed for high‑availability, high‑performance, and easy scalability, built on vector search libraries such as FAISS, Annoy, and HNSW.

3.2 Core Concepts

Unstructured Data : Data without a fixed schema, such as text, audio, or video, which can be converted to vectors for processing. Feature Vector : A continuous n‑dimensional array derived from embeddings. Vector Similarity Search : Retrieves the most similar vectors using approximate nearest‑neighbor (ANN) algorithms. Collection : Equivalent to a table, containing a set of entities. Entity : Equivalent to a row, composed of fields. Field : Column within an entity, can be scalar or vector data. Partition : Sub‑division of a collection for physical data separation. Index : Structures built on raw vectors (e.g., inverted lists, k‑d trees, high‑dimensional hashing) to accelerate search.

3.3 Similarity Calculation Principle

Common similarity metrics include cosine similarity, Euclidean distance, and Hamming distance. For illustration, Euclidean distance is shown for 2‑D, 3‑D, and higher‑dimensional spaces (images omitted for brevity).

3.4 Milvus System Architecture

Milvus 2.0 follows a cloud‑native, stateless design with four layers: Access Layer : Stateless proxies that expose endpoints and handle client authentication. Coordinator Service : The brain that schedules tasks across root, data, query, and index coordinators. Worker Node : Executes commands; includes data, query, and index nodes. Storage : Persists data via meta store, log broker, and object storage. Each layer can be scaled and fault‑tolerant independently.

3.5 Reasons to Choose Milvus

High Performance : Supports massive vector similarity search using FAISS, SPTAG, and optimized NSG graphs; can handle billion‑scale vectors on a single machine. High Availability & Reliability : Cloud‑native deployment ensures resilience. Hybrid Query : Allows scalar filtering alongside vector search. Developer Friendly : Multi‑language SDKs (Java, Python, C++, REST) simplify integration.

4 Practice in ZhiZhuan Risk Control

4.1 Milvus Deployment Options

Milvus can be deployed as a single‑node Docker‑Compose instance or as a Kubernetes cluster. We use the single‑node mode for its simplicity and sufficient performance for our use case.

4.2 Feature Vector Generation

We generate embeddings with the CLIP model, which saves compute resources by reusing an already‑deployed model and provides strong cross‑modal understanding for text‑to‑image search.

4.3 Index Structure Selection

Milvus supports Annoy, FAISS, HNSW, DiskANN, etc. Our scenario requires 100 % recall with offline usage, so we choose FAISS Flat index rather than compressed approximate indexes.

4.4 Data Filtering Implementation

We implement filtering by creating partitions based on time and data source, avoiding Milvus's scalar filtering because pre‑filtering can degrade performance in our workload.

4.5 Search Result Display

The image shows the top‑20 most similar products retrieved for the query term "萨摩耶" (Samoise) between 2023‑11‑19 and 2023‑11‑23.

About the Author

Xu Zuohong, backend R&D engineer at ZhiZhuan risk control, responsible for model development and maintenance.

e-commerceAIMilvusVector SearchCliprisk controlimage retrieval
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.