Artificial Intelligence 13 min read

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

The Imazon platform unifies Baidu’s image acquisition, feature extraction, and ANN‑based multimodal retrieval into a cloud‑native, real‑time pipeline that ingests billions of images daily, optimizes storage and GPU usage, reduces message‑queue costs, and ensures high‑throughput, low‑latency search across text, visual, and voice queries.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

Baidu Search consists of two major components: an online service that responds to user queries and an offline service that transforms and aggregates massive data from various sources before feeding it to the online layer. The offline processing is a typical large‑scale batch and real‑time hybrid scenario.

Since 2015 Baidu App has offered multimodal retrieval, extending traditional text search with visual and voice search capabilities. Visual retrieval products such as "guess word", "more‑size images", "short‑video images", and "similar recommendation" rely on core technologies like image classification (GPU‑based online models) and approximate nearest neighbor (ANN) search.

ANN techniques used include Baidu’s internal gno‑imi (low‑memory), graph‑based HNSW, and locality‑sensitive hashing (LSH) for SIFT‑type features. The choice balances cost, memory footprint, and feature suitability.

The project, internally named Imazon (Image + Amazon), aims to unify data acquisition and processing for image‑related business lines, handling billions of images per day with second‑level real‑time ingestion while reducing storage and compute costs.

The processing pipeline follows six stages: web spider → image content extraction → image spider → feature extraction (hundreds of features) → relationship storage → indexing. This flow is illustrated in the original article’s diagrams.

Key technical metrics include:

Throughput: single item size ~100 KB, real‑time ingestion ~100 QPS, whole‑web ingestion ~10 k QPS.

Scalability: cloud‑native deployment with elastic compute scheduling.

Stability: no data loss, automatic retry and replay, minute‑level success rate for time‑sensitive data, day‑level success rate for whole‑web data.

Effectiveness focuses on accurate image‑web link relationships, while R&D efficiency emphasizes language flexibility (C++, Go, PHP) and business‑wide reuse of data sources and DAG outputs.

Optimization practices described include:

Message‑queue cost reduction by passing trigger messages (byte references) and storing large payloads in a side cache.

Flow‑control mechanisms that smooth traffic peaks, applying back‑pressure to keep system utilization high.

Handling GPU bottlenecks for feature computation by splitting DAGs and using storage‑backed flow control.

The content‑relationship engine models three entities—F (web page), O (image URL), C (image content signature)—and their edges (fo, oc). It uses a three‑table design (C‑table with prefix hash, O‑table on SSD, F‑table on SSD) to achieve P‑scale graph storage with high write (10 k QPS for vertices, 100 k QPS for edges) and read performance (G bytes/s scan throughput).

Additional engineering practices cover data source reuse, multi‑language DAG framework, resource storage reuse via reference counting, and operational monitoring (distributed tracing, alerting, KPI dashboards).

The article also contains a recruitment notice for positions related to computer‑vision processing, indexing, and offline stream processing.

cloud-nativeBig Datastream processingimage processingDAGmultimodal retrievalsearch infrastructure
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.