Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers
Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.
With the rapid development of the AI era, storage technology faces higher demands for massive scale, high performance, and low cost. Baidu Canghai Storage has built a highly reusable unified technology base to address common cloud‑storage problems and make upper‑layer storage system iteration more efficient.
The unified technology base consists of three core components: a unified metadata base, a unified hierarchical Namespace, and a unified data base.
Unified Metadata Base is a distributed transactional key‑value store designed for metadata scenarios. It follows a Meta‑Aware design, providing trillion‑level metadata capacity and supporting object storage (BOS) and file storage (CFS/AFS).
Unified Hierarchical Namespace is built on the metadata base and has evolved into a single‑machine distributed integrated architecture with high performance and good scalability.
Unified Data Base is an online erasure‑coding (EC) storage system that delivers high throughput and low‑cost unified data storage, using a micro‑service architecture without logical single points and supporting ZB‑scale data.
The metadata base has undergone three generations. The first generation relied on multiple systems (MySQL and a distributed KV store) leading to high operational cost and limited linear scalability. The second generation (2017) introduced a self‑developed NewSQL project, improving scalability but still suffering from performance gaps. The third generation redesigned the system to be Meta‑Aware, deeply understanding metadata semantics, which brings optimizations in partitioning, transaction handling, engine selection, and SDK design.
Key optimizations include custom partition‑splitting and co‑located placement to keep related metadata in the same shard, a 5‑second TTL in‑memory MVCC to reduce GC overhead, flexible support for synchronous and asynchronous secondary indexes, and the ability to choose the most suitable storage engine (LSM‑Tree or in‑memory hash) based on access patterns.
The hierarchical Namespace also evolved through three stages: a single‑machine HDFS‑like solution (limited to ~1 billion metadata items), a distributed database‑based solution (linear scalability but sacrificing locality), and finally a single‑machine‑distributed integrated solution that provides seamless scaling from low‑latency single‑node operation to distributed operation.
Optimizations for the distributed Namespace include single‑partition transaction reduction (1PC), path‑parsing acceleration via an Index shard (reducing RPC calls from N to 1), and improved rename and write performance by offloading directory semantics to the metadata base.
The data base has similarly progressed through three phases: a master‑slave architecture with HDD‑dominant hardware and 3‑replica storage (high cost, limited fault tolerance), a mixed HDD/SSD setup with offline EC and multi‑data‑center support, and the current third‑generation micro‑service architecture with online EC, variable replication factors (1.5, 1.33, etc.), and no logical single points, offering higher availability, scalability, and faster iteration.
Overall, Baidu Canghai’s unified technology base, driven by the Meta‑Aware concept, significantly reduces system overhead, improves performance, and enhances scalability and flexibility across metadata, namespace, and data layers.
Further technical sharing on architecture, scalability, stability, and high performance will be released on the Baidu Intelligent Cloud WeChat public account.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.