Big Data 14 min read

Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data

This article presents Baidu Cloud's block storage architecture, comparing replication and erasure‑coding fault‑tolerance methods, detailing the challenges of applying EC to mutable block data, and describing a two‑layer append‑engine solution with selective 3‑replica caching, cost‑benefit compaction, and performance optimizations for low‑cost, high‑throughput storage.

DataFunTalk

Aug 30, 2023

Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data

The talk begins with a comparison of common data fault‑tolerance techniques, explaining why traditional RAID or simple replication is insufficient for large‑scale distributed storage and introducing erasure coding (especially Reed‑Solomon) as a cost‑effective alternative despite its higher compute and I/O overhead.

It then outlines the specific challenges of using erasure coding for block storage: frequent small‑write modifications, read‑modify‑write cycles, and I/O amplification caused by encoding/decoding, as well as the need to support both large and small I/O patterns efficiently.

Baidu's solution, built on the Aries EC system, introduces an index layer that redirects reads/writes to EC‑encoded segments and adopts an append‑only engine to avoid in‑place updates. Small writes are first cached in a three‑replica tier before being encoded in bulk, while large writes are encoded directly, reducing overall I/O amplification.

The architecture employs a two‑layer append design: a logical append engine for EC segments and a physical append engine for the underlying storage, enabling efficient space allocation and leveraging hardware‑accelerated RS encoding.

To control write amplification, Baidu proposes a cost‑benefit compaction algorithm that selects segments based on both hole ratio and data age, outperforming a simple greedy approach and achieving lower amplification especially under high space utilization.

Experimental results show that the cost‑benefit strategy keeps write amplification below 1.5× at 95% space usage, while the layered append and selective caching mechanisms provide a balanced trade‑off between cost, performance, and reliability for massive block storage workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Compaction storage architecture erasure-coding block storage append engine

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.