Artificial Intelligence 14 min read

Overview of Baidu's Wànxiàng System for Large‑Scale Rich Media Processing

Baidu’s Wànxiàng system processes billions of images and videos daily by extracting low‑ and high‑level features, linking related media, and aggregating semantic attributes in a scalable, timely architecture that leverages thousands of CPU, GPU, and FPGA cores to power accurate, low‑latency rich‑media search and recommendation.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Overview of Baidu's Wànxiàng System for Large‑Scale Rich Media Processing

With the rapid growth of rich‑media content (images, videos, audio) on the Internet, traditional web‑page‑centric search faces new challenges. Users now consume information through native apps and short‑video platforms, resulting in diverse content formats and fragmented feedback signals.

The Baidu search engine therefore needs to handle massive amounts of multimedia data, extract semantic features, and aggregate user behavior across multiple devices and applications.

Wànxiàng (meaning “all‑encompassing”) is Baidu’s internal system designed to process, index, and retrieve rich‑media assets at scale. It supports all image and video processing required by Baidu’s products, managing billions of media entities and delivering billions of processing requests daily.

The system is built around three core subsystems:

Qianren (Blades) : extracts low‑level and high‑level features from individual media items (e.g., OCR, scene detection, clarity, tags). It orchestrates tens of thousands of CPU cores and GPU/FPGA resources, converting all feature calculations into DAG jobs for efficient scheduling and deduplication.

Chuyu (Initial) : analyzes relationships between media entities (e.g., duplicate, similar, containment, event‑based clustering). It performs fingerprint‑level comparisons to discover cross‑media links such as video clips derived from the same source.

Danding (Athanors) : stores and aggregates feature data, merging attributes of identical or similar entities so that downstream retrieval can operate on content‑level signals rather than page‑level signals.

Wànxiàng also includes auxiliary services for cropping, transcoding, and editing. Its design emphasizes two key metrics:

Scalability : the ability to process petabytes of images and videos, leveraging heterogeneous resources (CPU, GPU, FPGA) across hundreds of thousands of cores.

Timeliness : delivering processed features, indexes, and quality signals within product iteration cycles to ensure up‑to‑date search relevance. By converting rich‑media into structured semantic attributes and aggregating user feedback at the content level, Wànxiàng enables Baidu to support image search, video search, recommendation, and other rich‑media‑driven services with high accuracy and low latency.

Artificial Intelligencesearch engineimage analysisrich mediavideo processingBaiduLarge-Scale Processing
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.