Big Data 14 min read

Bilibili HDFS Erasure Coding Strategy and Implementation

Bilibili reduced petabyte‑scale storage costs by back‑porting erasure‑coding patches to its HDFS 2.8.4 cluster, deploying a parallel EC‑enabled cluster, adding a data‑proxy service, intelligent routing and block‑checking, and automating cold‑data migration, while noting write overhead and planning native acceleration.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Bilibili HDFS Erasure Coding Strategy and Implementation

Background: With Bilibili’s rapid business growth, the amount of data generated reaches petabyte scale daily. Cold data, which accounts for more than 30% of total storage, occupies a large portion of high‑density storage despite having limited access frequency. To further reduce storage costs, Bilibili decided to adopt Erasure Coding (EC) for HDFS.

What is EC? Erasure Code (EC) is a forward error correction technology that can achieve higher data reliability with lower redundancy compared to traditional triple‑replication. EC encodes n original data blocks into n+m blocks; any n blocks can reconstruct the original data, tolerating up to m block failures.

Impact on storage: For a 256 MB file, triple‑replication consumes 768 MB, while an RS‑6‑3‑1024k EC scheme reduces the consumption to 384 MB, cutting storage by about 50%.

2. HDFS EC Overall Solution

2.1 Problems

The current HDFS cluster is version 2.8.4, which does not support EC. Upgrading to 3.x is complex and resource‑intensive.

Various HDFS clients (Java, C++, Presto) are in use, making a unified client version unrealistic in the short term.

2.2 Design Choice

To avoid a full cluster upgrade, Bilibili back‑ported necessary EC patches to the 2.8.4 codebase and deployed a parallel HDFS 3.3 EC cluster. The architecture is illustrated in Figure 2‑1.

2.3 Overall Process

The workflow (Figure 2‑2) includes deploying the EC‑enabled 3.3 cluster, adding EC mount points in the HDFS Router, back‑porting EC patches to the 2.8.4 client, and building an EC Data Proxy service for legacy clients.

3. HDFS EC Implementation Details

3.1 DN Proxy Implementation

Bilibili developed a Data Proxy service that allows all 2.8.4 clients (including C++ and Python) to read EC data transparently. The process is:

Patch the 2.8.4 client to support EC read/write and embed client version information in CallerContext.

The EC‑enabled NameNode validates the client version; older or unknown clients are redirected to the Data Proxy, while newer clients connect directly to DataNode Xceiver.

On the DataNode side, an ErasureCodingDataXceiver with a dedicated port aggregates blocks from multiple DNs and returns a reconstructed stream following the Replication‑Based Data Transfer Protocol.

3.2 Intelligent Data Routing System

Based on the existing HDFS Router, a custom EC mount point was added to route new writes to hot namespaces while keeping historical EC data accessible. An automated EC migration tool moves cold data to EC‑enabled namespaces during low‑traffic periods, guided by metadata analysis.

3.3 Block Checker

The Block Checker component ensures the integrity of reconstructed EC blocks. It records CRC checksums of each block’s metadata in a KV store after finalization, then recomputes checksums during reconstruction and compares them to detect corruption.

3.4 Intelligent EC Cold‑Backup Strategy

Cold data is identified by three criteria: (1) file age exceeds a cold‑backup threshold, (2) recent OPEN operation count below a threshold, and (3) recent read count below a threshold. The conversion workflow (Figure 3‑4) records cold‑backup flags in the HDFS metastore, filters partitions larger than 3 MB, and re‑converts partitions that have been altered by other data‑governance jobs.

4. Summary and Outlook

Since the rollout, hundreds of petabytes have been converted to EC, saving the cost of thousands of servers. While EC reduces storage cost, it introduces write‑performance overhead, is unsuitable for very small files, and increases the number of blocks per DataNode. Future work includes native acceleration of EC encoding/decoding, further stability improvements, and contributing back to the open‑source Hadoop community.

distributed systemsStorage OptimizationBig DataData ReliabilityErasure CodingHDFS
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.