Build an End-to-End Image-and-Text Search Engine with CLIP and ESCloud
This guide shows how to quickly create a complete image-and-text search solution using Volcano Engine's ESCloud, the CLIP model for feature extraction, and Python, covering data preparation, environment setup, index mapping, bulk indexing, and both text-to-image and image-to-image queries.
Image search is widely used in e‑commerce, advertising, design, and search engines, allowing users to find matching or similar images by entering text descriptions or uploading pictures.
Principle Introduction
The system extracts features from both images and text using the CLIP model, establishes a correspondence between them, and performs vector similarity search in a large image database to return the most relevant results. Feature extraction uses CLIP, while vector retrieval is powered by Volcano Engine's ESCloud.
Environment Preparation
1. Log in to Volcano Engine Cloud Search, create an instance cluster and select version 7.10.
2. Install required Python dependencies:
pip install -U sentence-transformers</code><code>pip install -U elasticsearch7==7.10.1</code><code>pip install -U pandasDataset Preparation
We use the Unsplash Lite dataset (~25,000 photos). After downloading the zip, a CSV file provides image URLs, which are read with pandas.
def read_imgset():
path = '${downloaded_dataset_path}'
documents = ['photos', 'keywords', 'collections', 'conversions', 'colors']
datasets = {}
for doc in documents:
files = glob.glob(path + doc + ".tsv*")
subsets = []
for filename in files:
df = pd.read_csv(filename, sep='\t', header=0)
subsets.append(df)
datasets[doc] = pd.concat(subsets, axis=0, ignore_index=True)
return datasetsModel Selection
The clip‑ViT‑B‑32 model (based on OpenAI 2021 paper) is chosen for both image‑to‑image and text‑to‑image search, as it can jointly represent images and text.
ESCloud Mapping Preparation
PUT image_search
{
"mappings": {
"dynamic": "false",
"properties": {
"photo_id": { "type": "keyword" },
"photo_url": { "type": "keyword" },
"describe": { "type": "text" },
"photo_embedding": { "type": "knn_vector", "dimension": 512 }
}
},
"settings": {
"index": {
"refresh_interval": "60s",
"number_of_shards": "3",
"knn.space_type": "cosinesimil",
"knn": "true",
"number_of_replicas": "1"
}
}
}ESCloud Database Operations
Connection
Connect to the cloud search instance:
cloudSearch = CloudSearch("https://{user}:{password}@{ES_URL}", verify_certs=False, ssl_show_warn=False)Write
from sentence_transformers import SentenceTransformer
from elasticsearch7 import Elasticsearch as CloudSearch
from PIL import Image
import requests, pandas as pd, glob
img_model = SentenceTransformer('clip-ViT-B-32')
text_model = SentenceTransformer('clip-ViT-B-32-multilingual-v1')
def encodedataset(photo_id, photo_url, describe, image):
return {
"photo_id": photo_id,
"photo_url": photo_url,
"describe": describe,
"photo_embedding": img_model.encode(image)
}
def load_image(url_or_path):
if url_or_path.startswith("http://") or url_or_path.startswith("https://"):
return Image.open(requests.get(url_or_path, stream=True).raw)
return Image.open(url_or_path)
def get_imgset_and_bulk():
datasets = read_imgset()
kwywords = datasets['keywords']
docs = []
for idx, row in datasets['photos'].iterrows():
photo_url = row["photo_image_url"]
photo_id = row["photo_id"]
image = load_image(photo_url)
filter = kwywords.loc[(kwywords['photo_id'] == photo_id) & (kwywords['suggested_by_user'] == 't')]
text = ' '.join(set(filter['keyword']))
one_document = encodedataset(photo_id, photo_url, text, image)
docs.append({"index": {}})
docs.append(one_document)
if idx % 20 == 0:
resp = cloudSearch.bulk(docs, index='image_search')
print(resp)
docs = []
return docs
if __name__ == '__main__':
docs = get_imgset_and_bulk()
print(docs)Query
Text‑to‑Image
def extract_text(text):
res = cloudSearch.search(
body={
"size": 5,
"query": {"knn": {"photo_embedding": {"vector": text_model.encode(text), "k": 5}}},
"_source": ["describe", "photo_url"]
},
index="image_search2"
)
return resImage‑to‑Image
def extract(img):
res = cloudSearch.search(
body={
"size": 5,
"query": {"knn": {"photo_embedding": {"vector": img_model.encode(img), "k": 5}}},
"_source": ["describe", "photo_url"]
},
index="image_search2"
)
return resVolcano Engine's ESCloud is compatible with Elasticsearch, Kibana and common plugins, offering structured and unstructured text search, statistics, and reporting, with one‑click deployment, elastic scaling, and simplified operations for log analysis and information retrieval.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
