Overview of Baidu's Wànxiàng System for Large‑Scale Rich Media Processing
Baidu’s Wànxiàng system processes billions of images and videos daily by extracting low‑ and high‑level features, linking related media, and aggregating semantic attributes in a scalable, timely architecture that leverages thousands of CPU, GPU, and FPGA cores to power accurate, low‑latency rich‑media search and recommendation.
With the rapid growth of rich‑media content (images, videos, audio) on the Internet, traditional web‑page‑centric search faces new challenges. Users now consume information through native apps and short‑video platforms, resulting in diverse content formats and fragmented feedback signals.
The Baidu search engine therefore needs to handle massive amounts of multimedia data, extract semantic features, and aggregate user behavior across multiple devices and applications.
Wànxiàng (meaning “all‑encompassing”) is Baidu’s internal system designed to process, index, and retrieve rich‑media assets at scale. It supports all image and video processing required by Baidu’s products, managing billions of media entities and delivering billions of processing requests daily.
The system is built around three core subsystems:
Qianren (Blades) : extracts low‑level and high‑level features from individual media items (e.g., OCR, scene detection, clarity, tags). It orchestrates tens of thousands of CPU cores and GPU/FPGA resources, converting all feature calculations into DAG jobs for efficient scheduling and deduplication.
Chuyu (Initial) : analyzes relationships between media entities (e.g., duplicate, similar, containment, event‑based clustering). It performs fingerprint‑level comparisons to discover cross‑media links such as video clips derived from the same source.
Danding (Athanors) : stores and aggregates feature data, merging attributes of identical or similar entities so that downstream retrieval can operate on content‑level signals rather than page‑level signals.
Wànxiàng also includes auxiliary services for cropping, transcoding, and editing. Its design emphasizes two key metrics:
Scalability : the ability to process petabytes of images and videos, leveraging heterogeneous resources (CPU, GPU, FPGA) across hundreds of thousands of cores.
Timeliness : delivering processed features, indexes, and quality signals within product iteration cycles to ensure up‑to‑date search relevance. By converting rich‑media into structured semantic attributes and aggregating user feedback at the content level, Wànxiàng enables Baidu to support image search, video search, recommendation, and other rich‑media‑driven services with high accuracy and low latency.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.