Why Data Safety Trumps Performance in Distributed Storage Systems
This article examines the inherent risks of distributed storage, emphasizes that data safety outweighs raw performance, and explains storage types, file system structures, media characteristics, and practical solutions such as SSD caching and SRVSAN architecture to mitigate those risks.
Introduction
The main risk of distributed storage stems from the conflict between sharing, massive data volumes, high performance demands, and the use of X86 servers with inexpensive disks, not merely from data architecture flaws.
The analysis focuses on network and disk throughput from the host perspective, independent of any specific vendor's storage solution.
Key Insight
What is the most important metric for storage?
Many experts prioritize performance metrics like IOPS and throughput, but the article argues that data safety is paramount; a fast storage that loses data can be catastrophic.
Storage Types
Four primary storage categories are described:
DAS (Direct‑Attached Storage)
NAS (Network‑Attached Storage)
SAN (Storage‑Area Network)
Object storage (combining SAN and NAS advantages)
Illustrations show how an application reads a file via Windows, detailing the steps from application request to physical disk access.
DAS integrates compute and storage in a single server, NAS separates them and provides a shared file system over Ethernet (CIFS/NFS), while SAN offers block‑level access without a file system, often using FC or high‑speed Ethernet.
Memory communication > Bus communication > Network communication
Memory communication is fastest, followed by bus, then network; modern Ethernet (10 Gb/s, 40 Gb/s) is no longer the bottleneck for storage.
File Systems
A file system acts as a "ledger" managing file locations, access, and security. Examples include FAT/FAT32/NTFS on Windows and ext1‑4 on Linux.
Key structures such as the Master Boot Record, partition table, directory area, and data area are explained using a FAT32 example.
Linux ext file systems use superblocks, inode tables, and data blocks; inode metadata includes size, owner IDs, permissions, timestamps, link count, and block pointers.
Distributed file systems like Hadoop's HDFS store data across multiple nodes, with a NameNode managing metadata and DataNodes storing blocks. The article outlines the block‑level write flow.
Storage Media
Common media include magnetic disks, SSDs, optical discs, and tapes. Disk performance depends on spindle speed, seek time, capacity, and interface speed; higher speed and larger capacity improve performance.
Performance metrics focus on IOPS and throughput, which vary with block size, read/write pattern, and sequential vs random access.
SSD advantages (electronic control, higher IOPS, lower latency) are contrasted with HDD limitations. SSD types (SLC, MLC, TLC) differ in cost, endurance, and speed; enterprise SSDs typically use MLC.
SSD caching can dramatically boost performance; the article cites an SRVSAN deployment using a PCIe 2.0 SSD (1.2 TB, 260 k IOPS, 1.55 GB/s read) alongside SATA 7200 RPM disks, highlighting the disparity between cache and bulk storage capabilities.
Conclusion
Data safety must be the primary concern in storage design, and combining high‑cost SSD caches with low‑cost SATA capacity layers offers a practical solution to meet performance and reliability requirements.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.