Big Data 12 min read

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

This article details Baidu's large‑scale image processing and multimodal retrieval system, describing its offline‑online architecture, massive data ingestion pipeline, ANN search techniques, performance metrics, infrastructure components, and a series of optimizations for throughput, cost, and reliability in a high‑volume streaming environment.

High Availability Architecture

May 18, 2021

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

In Baidu Search, the system is divided into online services that respond to user queries and offline services that ingest and transform massive amounts of data, forming a typical batch‑plus‑real‑time processing scenario for multimodal retrieval.

Since 2015, Baidu App has offered multimodal retrieval, extending traditional text search with visual and voice capabilities. Visual search relies on classification models and approximate nearest neighbor (ANN) retrieval.

For ANN, Baidu employs clustering‑based gno-imi, graph‑based HNSW, and locality‑sensitive hashing, each chosen for cost and feature suitability; gno-imi is an internal low‑memory solution suitable for billions‑scale ANN, while LSH improves recall for SIFT‑type local features.

The offline pipeline must collect the entire web’s images, compute hundreds of features per image, and maintain image‑page‑web relationships, which incurs huge compute and storage costs.

To address these challenges, the Search Architecture and Content Technology teams co‑designed the "Image Processing Ingestion Platform" (internal name Imazon). Its goals are unified data acquisition and processing, support for billions‑to‑trillions of images with real‑time ingestion rates, and rapid data updates.

The platform’s workflow consists of six stages: web spider → image extraction → image spider → feature computation → relationship storage → indexing. This pipeline enables daily processing of tens of billions of images and second‑level real‑time ingestion of hundreds of images per second.

Key technical metrics include:

Throughput: per‑item size ~100 KB, real‑time ingestion ~100 qps, full‑web ingestion ~10 k qps.

Scalability: cloud‑native deployment with elastic compute.

Stability: no data loss, automatic retry/replay, minute‑level success for real‑time data and day‑level success for full‑web data.

Effectiveness: accurate image‑page link relationships.

R&D efficiency: language‑agnostic support (C++, Go, PHP) and reusable business logic.

The architecture centers on a streaming real‑time processing system that also supports batch jobs, using elastic compute, event‑driven design, and decoupled DAG execution.

Infrastructure components include storage (Table, BDRP/Redis, UNDB, BOS), message queue (BigPipe), service frameworks (BaiduRPC, GDP‑Go, ODP‑PHP), pipeline scheduler (Odyssey), flow‑control system, elastic compute platform (Qianren), content relationship engine, and offline micro‑service components (Tigris).

Optimization practices focus on reducing message‑queue cost by transmitting references (trigger messages) and using side‑cache for operator outputs, implementing back‑pressure and flow‑control to smooth traffic spikes, and handling GPU bottlenecks through workload splitting, elastic resource allocation, and event‑driven scheduling.

The content relationship engine models the web‑image ecosystem as a three‑part graph (C‑content, O‑object URL, F‑from URL) stored across three tables (C, O, F) optimized for scan‑order, SSD‑based random reads, and version‑based validation to ensure consistency.

Additional practices improve data source reuse, DAG output reuse, resource storage reuse with multi‑tenant reference counting, multi‑language support via RPC isolation, and comprehensive monitoring, tracing, and alerting to reduce maintenance overhead.

Overall, the Imazon platform demonstrates how Baidu achieves high‑throughput, low‑cost, and reliable image processing and multimodal retrieval at internet scale.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Image Processing Multimodal Retrieval Baidu streaming architecture large-scale data Imazon

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.