Cloud Computing 15 min read

Bilibili Object Storage Service (BOSS) Design: Building a Large-Scale Distributed Storage System in 13 Days

In just 13 days Bilibili transformed a simple MySQL‑based S3 prototype into the BOSS distributed object storage system by separating metadata and data, adding an RPC abstraction layer, implementing two‑level sharding, switching to RocksDB, and deploying a three‑replica, multi‑zone high‑availability architecture.

Bilibili Tech

May 17, 2022

Bilibili Object Storage Service (BOSS) Design: Building a Large-Scale Distributed Storage System in 13 Days

This article details the design and implementation of Bilibili's object storage system (BOSS), covering a 6-day journey from a simple MySQL-based prototype to a distributed storage architecture.

Day 1 - Basic S3 Protocol Implementation: The S3 protocol provides simple interfaces (PUT/GET/DEL/LIST). The initial implementation uses MySQL with a single table storing both data and metadata, using bucketName + objectName as the primary key.

Day 2 - Metadata and Data Separation: Separated metadata (meta table) from data (data table) using object_id as the association key. Introduced Block tables to handle large files - splitting data into blocks with blockID as primary key, and using protobuf to compress blockID lists.

Day 3 - Abstraction Layer: Added RPC layer (S3 metadata service and IO service) between gateway and MySQL to decouple from specific storage implementations.

Day 4 - Sharding Strategy: Implemented two-level mapping using virtual shards (22,333 shards). First-level maps key to virtual shard via hash, second-level maps shard to storage node. This enables horizontal scaling and simplifies capacity expansion.

Day 5 - Storage Engine Replacement: Replaced MySQL with RocksDB as the single-node storage engine, organized in three layers: RPC layer, shard layer, and engine layer.

Day 6 - High Availability: Introduced resource pools and availability zones for fault isolation. Implemented 3-replica strategy with replicas distributed across different availability zones. Used star write model (write to all replicas, majority success) instead of complex consensus protocols, since keys are globally unique (auto-increment IDs) and no write conflicts occur.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

System Architecture sharding Replication distributed storage RocksDB object storage S3 protocol

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.