Cloud Computing 15 min read

Bilibili Object Storage Service (BOSS) Design: Building a Large-Scale Distributed Storage System in 13 Days

In just 13 days Bilibili transformed a simple MySQL‑based S3 prototype into the BOSS distributed object storage system by separating metadata and data, adding an RPC abstraction layer, implementing two‑level sharding, switching to RocksDB, and deploying a three‑replica, multi‑zone high‑availability architecture.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Bilibili Object Storage Service (BOSS) Design: Building a Large-Scale Distributed Storage System in 13 Days

This article details the design and implementation of Bilibili's object storage system (BOSS), covering a 6-day journey from a simple MySQL-based prototype to a distributed storage architecture.

Day 1 - Basic S3 Protocol Implementation: The S3 protocol provides simple interfaces (PUT/GET/DEL/LIST). The initial implementation uses MySQL with a single table storing both data and metadata, using bucketName + objectName as the primary key.

Day 2 - Metadata and Data Separation: Separated metadata (meta table) from data (data table) using object_id as the association key. Introduced Block tables to handle large files - splitting data into blocks with blockID as primary key, and using protobuf to compress blockID lists.

Day 3 - Abstraction Layer: Added RPC layer (S3 metadata service and IO service) between gateway and MySQL to decouple from specific storage implementations.

Day 4 - Sharding Strategy: Implemented two-level mapping using virtual shards (22,333 shards). First-level maps key to virtual shard via hash, second-level maps shard to storage node. This enables horizontal scaling and simplifies capacity expansion.

Day 5 - Storage Engine Replacement: Replaced MySQL with RocksDB as the single-node storage engine, organized in three layers: RPC layer, shard layer, and engine layer.

Day 6 - High Availability: Introduced resource pools and availability zones for fault isolation. Implemented 3-replica strategy with replicas distributed across different availability zones. Used star write model (write to all replicas, majority success) instead of complex consensus protocols, since keys are globally unique (auto-increment IDs) and no write conflicts occur.

System ArchitectureShardingHigh AvailabilityReplicationDistributed StorageRocksDBObject StorageS3 protocol
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.