Bilibili Object Storage Service (BOSS) Design: Building a Large-Scale Distributed Storage System in 13 Days
In just 13 days Bilibili transformed a simple MySQL‑based S3 prototype into the BOSS distributed object storage system by separating metadata and data, adding an RPC abstraction layer, implementing two‑level sharding, switching to RocksDB, and deploying a three‑replica, multi‑zone high‑availability architecture.
This article details the design and implementation of Bilibili's object storage system (BOSS), covering a 6-day journey from a simple MySQL-based prototype to a distributed storage architecture.
Day 1 - Basic S3 Protocol Implementation: The S3 protocol provides simple interfaces (PUT/GET/DEL/LIST). The initial implementation uses MySQL with a single table storing both data and metadata, using bucketName + objectName as the primary key.
Day 2 - Metadata and Data Separation: Separated metadata (meta table) from data (data table) using object_id as the association key. Introduced Block tables to handle large files - splitting data into blocks with blockID as primary key, and using protobuf to compress blockID lists.
Day 3 - Abstraction Layer: Added RPC layer (S3 metadata service and IO service) between gateway and MySQL to decouple from specific storage implementations.
Day 4 - Sharding Strategy: Implemented two-level mapping using virtual shards (22,333 shards). First-level maps key to virtual shard via hash, second-level maps shard to storage node. This enables horizontal scaling and simplifies capacity expansion.
Day 5 - Storage Engine Replacement: Replaced MySQL with RocksDB as the single-node storage engine, organized in three layers: RPC layer, shard layer, and engine layer.
Day 6 - High Availability: Introduced resource pools and availability zones for fault isolation. Implemented 3-replica strategy with replicas distributed across different availability zones. Used star write model (write to all replicas, majority success) instead of complex consensus protocols, since keys are globally unique (auto-increment IDs) and no write conflicts occur.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.