Distributed File Systems: Overview, Design Requirements, Architecture Models, and Key Considerations
This article provides a comprehensive overview of distributed file systems, covering their historical evolution, essential design requirements, centralized and decentralized architecture models, persistence, scalability, high availability, performance optimization, security, and additional practical aspects such as space allocation, file deletion, small‑file handling, and deduplication.
Overview
Distributed file systems are a fundamental application in the distributed domain, with HDFS and GFS being the most well‑known examples; understanding their design principles offers valuable insights for similar scenarios.
Beyond HDFS/GFS, many other products exist, each with distinct characteristics, expanding our perspective.
The article analyzes the problems to solve, available solutions, and criteria for choosing among them.
Past
In the 1980s, systems like Sun's Network File System (NFS) separated disks from hosts, enabling larger capacity, host switching, data sharing, backup, and disaster recovery.
With the rise of the Internet, massive data growth required horizontal scaling, fault tolerance, high availability, persistence, and elasticity.
Requirements
Conform to POSIX file interface standards.
Transparent to users, behaving like a local file system.
Persistence to prevent data loss.
Scalability to accommodate growing data pressure.
Robust security mechanisms.
Data consistency across reads.
Additional desirable features include large storage capacity, high concurrency, high performance, and efficient hardware utilization.
Architecture Models
Two main routes exist: centralized and decentralized.
1. Centralized (e.g., GFS)
The master node handles file location, metadata, fault detection, and data migration. Clients query the master for chunk locations, then communicate directly with chunk servers for data transfer.
Master nodes typically do not participate in data reads/writes, reducing bottlenecks.
2. Decentralized (e.g., Ceph)
All nodes are autonomous; the cluster consists of a single node type where each node stores both metadata and data (RADOS). Ceph uses the CRUSH algorithm to map files to nodes without a central coordinator.
Persistence
Data is persisted via multiple replicas. Challenges include ensuring replica consistency, dispersing replicas to avoid correlated failures, detecting corrupted or stale replicas, and selecting the appropriate replica for client reads.
Synchronous writes guarantee consistency but increase latency.
Parallel and chain writes improve performance.
W+R>N quorum writes trade read cost for lower write latency.
Scalability
1. Storage node scaling
Balance load across nodes using metrics such as disk usage, CPU, and network traffic.
Prefer nodes with lower utilization when allocating new space.
Perform data migration when nodes become overloaded.
Introduce new nodes gradually (pre‑heat) to avoid sudden load spikes.
2. Central node scaling
Use larger data blocks (e.g., 64 MiB in HDFS) to reduce metadata volume.
Adopt multi‑level metadata hierarchies.
Deploy stateless master nodes sharing a common storage backend (e.g., iRODS).
High Availability
Both master and storage nodes require HA. Master HA can be achieved via active‑passive replication or shared storage; storage HA is inherently provided by replica mechanisms discussed in persistence.
Persist metadata in databases or log‑based storage with periodic snapshots.
Performance Optimization & Cache Consistency
Cache file contents in memory.
Prefetch data blocks.
Batch read/write requests.
Cache introduces consistency challenges such as write‑lost updates and stale reads, mitigated by read‑only policies or locking mechanisms with appropriate granularity.
Security
Distributed file systems serve multiple tenants and must enforce robust access control. Common models include:
DAC (Unix‑style user/group/privilege).
MAC (e.g., SELinux).
RBAC (role‑based).
Systems like Ceph and Hadoop integrate these models, sometimes extending them with custom solutions.
Other Topics
1. Space Allocation
Two approaches: contiguous space (fast I/O but prone to fragmentation) and linked‑list space (low fragmentation but slower random reads). Index tables (i‑nodes) mitigate linked‑list drawbacks.
2. File Deletion
Logical deletion with delayed reclamation is common, allowing recovery before permanent removal.
3. Small‑File Distributed Systems
Store small files as metadata pointing to offsets within large data blocks, leveraging the efficiency of big‑block storage while keeping metadata lightweight.
4. File Fingerprinting & Deduplication
Hash‑based fingerprints (MD5, SHA‑256, SimHash, MinHash) identify identical content for deduplication, integrity checks, and version comparison.
Conclusion
Distributed file systems involve a wide range of considerations beyond those covered here; the article provides a concise analysis to guide future design decisions and encourages deeper exploration of specific solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
