How RAID and Replication Shape Distributed Storage Architecture
This article explores RAID and replication strategies in distributed storage, detailing stripe concepts, RAID0/1/10/5/6 configurations, the typical three‑node architecture with metadata and data servers, performance bottlenecks, reconstruction experiments, and practical mitigation techniques to ensure data integrity and high availability.
Raid and Replication
Striping divides disks at the same offset into a logical space, allowing data to be written across multiple disks (stripe). Stripe length is the number of sectors per stripe; stripe depth is the number of sectors per disk belonging to the stripe.
Why striping? It enables concurrent I/O on multiple disks, solving the single‑disk I/O limitation.
Raid0 uses striping for performance but provides no redundancy. Raid1 mirrors data on two disks, offering redundancy with slower writes but faster reads. Combining them yields Raid10, which offers higher redundancy than Raid01.
Common RAID levels:
Raid5 (minimum 3 disks) – 1 data replica + 1 parity, tolerates 1 disk failure.
Raid6 (minimum 4 disks) – 1 data replica + 2 parity, tolerates 2 disk failures.
These RAID schemes consume CPU and memory for parity calculations, often offloaded to RAID cards or, in distributed environments, to host CPUs.
Distributed Storage Architecture
The typical architecture consists of a client (or application), a metadata server (MDS), and data node servers.
Clients interact with the metadata server for signaling and with data nodes for media transfer. The metadata server directs clients to the appropriate data nodes.
Block, object, and file storage share a similar high‑level architecture, differing mainly in how they expose underlying storage:
Distributed block storage presents raw blocks to the client, which manages its own filesystem.
Object and file storage hide the block layer behind a filesystem, providing hierarchical (file) or flat (object) namespaces.
FusionStorage Example
FusionStorage comprises MDC, OSD, and Client components.
MDC records OSD and disk status, synchronizing with VBS, which calculates data placement. MDC can be deployed singly, centrally, or distributed. A single MDC failure does not stop storage, but OSD state changes during MDC outage can affect I/O.
VBS (Virtual Block Service) is the core element that computes data block locations using consistent hashing, mapping logical addresses to partitions on disks.
Problems in Distributed Storage
Data safety relies on replicas. When a disk fails, the system must reconstruct missing replicas quickly. Large‑scale simultaneous failures can jeopardize data integrity.
Typical deployments use three replicas, placed on different racks or servers, allowing up to two server failures.
Reconstruction speed depends on the number of participating servers, network bandwidth, and I/O load. Experiments show that without I/O, 1 TB can be rebuilt in ~12 minutes; with light I/O, reconstruction takes ~24 minutes.
Network utilization during reconstruction is limited (often <30 % of 10 Gbps links) to avoid impacting normal I/O.
Mitigation Strategies
Regularly replace aging X86 servers to reduce the risk of simultaneous failures. Expansion should be performed gradually to keep the failure domain small.
When a server fails, set it to maintenance mode and replace it after ensuring data has been re‑replicated. For rapid recovery, configure a 15‑minute grace period before starting replica reconstruction, allowing transient disk issues to resolve without unnecessary data movement.
Full‑SSD distributed storage can improve performance but introduces SSD wear‑out concerns; proactive SSD replacement is required.
Overall, increasing the number of servers or disks involved in reconstruction accelerates recovery, but careful planning of replica placement, network capacity, and hardware refresh cycles is essential for reliable distributed storage.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.