Design and Optimization of Baidu's Image Processing and Ingestion Platform (Imazon) for Multimodal Retrieval
This article details Baidu's multimodal retrieval architecture, explaining the separation of online and offline services, the design of the Imazon image processing and ingestion platform, its technical indicators, large‑scale streaming and batch pipelines, optimization practices for high throughput, and the underlying content‑relationship engine.
In Baidu Search, the system is divided into "online" services that respond to user queries and "offline" services that preprocess massive data before feeding it to the online layer; the offline component exemplifies a hybrid batch‑real‑time processing scenario.
Since 2015, Baidu App has offered multimodal retrieval, adding visual and voice search to traditional text search. Visual search products (e.g., guess‑word, image size variants, short‑video, e‑commerce) rely on classification (GPU online models) and ANN retrieval.
The offline pipeline processes hundreds of terabytes of image data, extracting over a hundred feature types and maintaining image‑page‑link relationships for provenance.
The "Image Processing Ingestion Platform" (internal name Imazon) unifies data acquisition, processing, and storage for image‑related services, achieving unified data handling, supporting billions of images per day, and enabling real‑time ingestion at hundreds of QPS.
Technical indicators
Throughput: single data item ~100 KB, real‑time ingestion ~100 QPS, full‑web ingestion ~10 K QPS.
Scalability: cloud‑native deployment with elastic compute scheduling.
Stability: no data loss, automatic retry/replay, minute‑level success rate for time‑sensitive data, day‑level success for full‑web data.
Effectiveness: accurate image‑page link relationships, supporting updates when pages or images disappear.
R&D efficiency: language‑agnostic support for C++, Go, PHP; reusable DAG components; standardized schemas.
Architecture design
The platform is built as a streaming real‑time system with optional batch processing, employing elastic compute, event‑driven pipelines, and decoupled DAG execution. Key components include:
Storage: Table, BDRP (Redis), UNDB, BOS.
Message queue: BigPipe.
Service frameworks: BaiduRPC, GDP (Go), ODP (PHP).
Pipeline scheduler: Odyssey for DAG orchestration.
Flow‑control system for load smoothing.
Resource manager (千仞) for CPU/GPU operator scaling.
Content relationship engine for graph‑based image‑page mapping.
Offline micro‑service component Tigris for remote RPC execution.
Optimization practices
To handle high‑throughput, low‑cost requirements, Baidu applied:
Message‑queue cost reduction by passing references (trigger messages) and using side‑cache for large payloads.
Flow‑control with back‑pressure to smooth traffic spikes, ensuring high utilization.
GPU resource elasticity and temporary resource borrowing to avoid data backlog.
Additional practices include data source reuse, DAG output reuse via RPC chaining, multi‑tenant storage with reference counting, and unified language support through remote RPC isolation.
Content relationship engine
The engine models five entity types (f, o, c, fo, oc) and their edges, forming a trillion‑node graph stored across three tables (C, O, F) with P‑level storage, enabling high‑performance reads and writes for graph‑based queries.
Overall, the Imazon platform demonstrates how large‑scale image processing, multimodal retrieval, and graph‑based relationship modeling can be achieved with cloud‑native, high‑throughput, and cost‑effective engineering.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.