Big Data 14 min read

Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data

This article presents Baidu Cloud's block storage architecture, comparing replication and erasure‑coding fault‑tolerance methods, detailing the challenges of applying EC to mutable block data, and describing a two‑layer append‑engine solution with selective 3‑replica caching, cost‑benefit compaction, and performance optimizations for low‑cost, high‑throughput storage.

DataFunTalk
DataFunTalk
DataFunTalk
Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data

The talk begins with a comparison of common data fault‑tolerance techniques, explaining why traditional RAID or simple replication is insufficient for large‑scale distributed storage and introducing erasure coding (especially Reed‑Solomon) as a cost‑effective alternative despite its higher compute and I/O overhead.

It then outlines the specific challenges of using erasure coding for block storage: frequent small‑write modifications, read‑modify‑write cycles, and I/O amplification caused by encoding/decoding, as well as the need to support both large and small I/O patterns efficiently.

Baidu's solution, built on the Aries EC system, introduces an index layer that redirects reads/writes to EC‑encoded segments and adopts an append‑only engine to avoid in‑place updates. Small writes are first cached in a three‑replica tier before being encoded in bulk, while large writes are encoded directly, reducing overall I/O amplification.

The architecture employs a two‑layer append design: a logical append engine for EC segments and a physical append engine for the underlying storage, enabling efficient space allocation and leveraging hardware‑accelerated RS encoding.

To control write amplification, Baidu proposes a cost‑benefit compaction algorithm that selects segments based on both hole ratio and data age, outperforming a simple greedy approach and achieving lower amplification especially under high space utilization.

Experimental results show that the cost‑benefit strategy keeps write amplification below 1.5× at 95% space usage, while the layered append and selective caching mechanisms provide a balanced trade‑off between cost, performance, and reliability for massive block storage workloads.

Big DataCompactionstorage architectureErasure Codingblock storageAppend Engine
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.