Big Data 13 min read

How Baidu Scales Multimodal Image Search with the Imazon Platform

This article explains Baidu's multimodal retrieval system, detailing the offline and online pipelines, the image processing and indexing platform (Imazon), its architecture, key technologies such as ANN and GPU models, and the optimization practices that enable massive daily image ingestion and real‑time search at billion‑scale.

21CTO
21CTO
21CTO
How Baidu Scales Multimodal Image Search with the Imazon Platform

Multimodal Retrieval: Offline and Online

In Baidu Search, the system consists of "search online" and "search offline" components. The online service handles user queries, while the offline service transforms and processes data from various sources before feeding it to the online service, representing a classic large‑scale batch and real‑time computing scenario.

Image Processing and Ingestion Platform (Imazon)

Since 2015, Baidu App has offered multimodal search, adding visual and voice capabilities to traditional text search. Visual search and related image services share common offline and online technologies, relying on classification (GPU online model inference) and ANN retrieval.

The offline pipeline collects images from the entire web, computes hundreds of features per image, and maintains relationships among images, image links, and web pages.

To address these challenges, the Search Architecture and Content Technology Architecture teams co‑designed the "Image Processing and Ingestion Platform" (Imazon) with three main goals:

Unified data acquisition and processing to integrate image‑related workflows, improve efficiency, and reduce storage and compute costs.

Support billions to tens of billions of images with fast data collection, ingestion, and updates.

Provide real‑time image filtering and customized delivery pipelines to improve timeliness of image resources.

Imazon combines high‑throughput streaming processing with batch capabilities, leveraging elastic compute, event‑driven scheduling, and DAG decoupling.

Platform Architecture and Key Technologies

The platform processes data through six stages: web spider → image extraction → image spider → feature computation → relationship storage → indexing.

Technical metrics focus on architecture (throughput, scalability, stability), effectiveness (accurate image‑page relationships), and development efficiency (business reusability and language flexibility).

Throughput: target data size ~100 KB per image + features; real‑time ingestion at hundreds of QPS, full‑web ingestion at ten‑thousands QPS.

Scalability: cloud‑native deployment with elastic resource scheduling.

Stability: no data loss, automatic retry and replay, minute‑level success rate for time‑sensitive data.

Effectiveness metrics emphasize true image‑page link relationships and content signatures.

Development efficiency is measured by business universality (supporting all image‑dependent services) and language flexibility (C++, Go, PHP).

Infrastructure

Underlying Baidu infrastructure includes storage (Table, BDRP/Redis, UNDB, BOS), message queue (BigPipe), and service frameworks (BaiduRPC, GDP‑Go, ODP‑PHP). Supporting components comprise pipeline scheduler (Odyssey), flow‑control system, elastic compute platform (Qianren), content relationship engine, and offline micro‑service components (Tigris).

Optimization Practices

To handle massive throughput under limited resources, Baidu applied several optimizations:

Message‑queue cost reduction by passing references (trigger messages) and storing operator outputs in side‑cache.

Flow‑control and back‑pressure mechanisms to smooth traffic spikes and prioritize high‑priority data.

Dynamic resource scaling and DAG splitting to mitigate GPU bottlenecks during feature computation.

Additional practices improve reuse of data sources, DAG outputs, and storage resources, and support multi‑language development through RPC‑based DAG frameworks.

Content Relationship Engine

The engine models three‑layer graphs (from‑URL → object‑URL → content‑sign) with edges capturing page‑image context and crawl timestamps, enabling P‑scale graph storage with high write and read throughput.

Design uses three tables (C, O, F) with prefix hashing, SSD‑backed storage, and version‑based validation to ensure consistency while supporting massive graph operations.

Operational Improvements

Efforts to reduce maintenance cost include distributed tracing, comprehensive monitoring, and alerting, focusing on core business metrics (ingestion rate, latency, error rates) and system health indicators (DAG throughput, OP status, service latency).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Image Processingpipeline optimizationBaidumultimodal searchlarge-scale data
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.