Understanding Distributed Storage: HDFS, Ceph, GlusterFS, and FastDFS
This article provides a concise technical overview of four major distributed storage solutions—HDFS, Ceph, GlusterFS, and FastDFS—covering their architecture, key features, pros and cons, and typical use cases for large‑scale data processing and storage.
HDFS
HDFS (Hadoop Distributed File System) is the default file system for the Hadoop ecosystem. It is optimized for high‑throughput sequential access to very large files and for batch‑processing frameworks such as MapReduce and Spark.
Architecture
NameNode – stores the namespace metadata (file‑to‑block mapping, block locations, permissions). It is a single point of control; high availability is achieved by configuring an active‑standby pair.
DataNode – runs on each storage host and stores HDFS blocks on local disks. It reports block reports and heartbeats to the NameNode.
Secondary/Backup NameNode – periodically merges the edit log with the filesystem image to limit the size of the edit log.
Key Features
Large‑file, sequential read/write optimization; typical block size 128 MiB or 256 MiB.
Centralized metadata service provides fast namespace operations but requires HA.
Default three‑replica replication gives fast recovery; optional erasure‑coding (EC) reduces storage overhead for cold data.
Advantages
Very high throughput for bulk data.
Mature ecosystem; tight integration with Hadoop, Hive, Spark, Flink, etc.
Simple data model (files → blocks).
Limitations
Poor performance for many small files and random‑access workloads.
NameNode is a bottleneck; HA adds operational complexity.
Storage overhead of replication unless EC is enabled.
Typical Use Cases
Offline big‑data analytics, ETL pipelines, log aggregation.
Data‑lake storage where files are written once and read many times.
Ceph
Ceph provides a unified storage platform that simultaneously offers object, block, and POSIX file system interfaces.
Architecture
RADOS – the underlying Reliable Autonomic Distributed Object Store; all data is stored as objects in placement groups.
CRUSH algorithm – deterministic data placement that eliminates the need for a central metadata server.
Ceph Monitors (MON) – maintain cluster maps (OSD map, CRUSH map, monitor map) and provide consensus.
Object Storage Daemons (OSD) – run on each storage node, store objects, handle replication/EC, and serve client I/O.
RBD – block device interface for virtual machines and containers.
CephFS – POSIX‑compatible file system built on top of RADOS, with a separate metadata server (MDS) for directory hierarchy.
RGW (RADOS Gateway) – S3/Swift compatible object‑storage gateway.
Key Features
Fully decentralized metadata using CRUSH; no single‑point bottleneck.
Supports both replication and erasure coding; configurable per pool.
Scales to thousands of OSDs and petabytes of data.
Native integration with OpenStack Cinder, Nova, and Glance.
Advantages
High scalability and fault tolerance.
Single platform for object, block, and file storage.
Self‑healing and self‑balancing.
Challenges
Complex deployment and tuning; requires careful sizing of network, CPU, and OSD hardware.
Performance for small files and high‑concurrency metadata may need additional tuning (e.g., MDS cache, OSD journal).
Typical Use Cases
Cloud infrastructure storage back‑ends (OpenStack, Kubernetes).
Enterprise‑grade distributed storage for backups, archives, and big data.
Unified object‑block‑file storage for multi‑tenant environments.
GlusterFS
GlusterFS aggregates storage servers into a single global namespace and presents it as a POSIX‑compatible file system.
Architecture
Each server runs a glusterd daemon and exports bricks (directories) that are combined into a volume.
Volume types: Distributed (data spread across bricks), Replicated (copies on multiple bricks), Striped (chunks across bricks), and combinations (e.g., distributed‑replicated).
Clients can mount the volume via FUSE or access it through NFS/SMB gateways.
Key Features
Elastic scaling – add or remove bricks without downtime.
Self‑healing replication; automatic re‑balance when bricks change.
Supports geo‑replication for disaster recovery.
Advantages
Simple installation; works on commodity Linux servers.
Good protocol compatibility (POSIX, NFS, SMB).
Suitable for small‑ to medium‑scale workloads.
Limitations
Metadata operations are coordinated by the glusterd process; at very large scale or high metadata concurrency performance can degrade.
Advanced features (e.g., tiering, snapshots) may require additional configuration and have stability considerations.
Typical Use Cases
File‑sharing services, media asset storage, home directories.
Environments that need POSIX semantics together with NFS/SMB access.
FastDFS
FastDFS is a lightweight, high‑performance distributed file system optimized for massive numbers of small files and high request rates.
Architecture
Tracker server – acts as a scheduling and naming service; stores only metadata about storage groups, server status, and remaining capacity. It does not hold file data.
Storage server – stores the actual file content; each storage server belongs to a group and reports its status to the tracker.
Clients first obtain a storage server address from a tracker, then upload/download files directly to/from that storage node.
Key Features
Simple deployment: only a few tracker nodes and many storage nodes.
Supports file upload/download, file append, and file delete via HTTP or custom protocol.
Built‑in load balancing across storage servers within a group.
Advantages
Low overhead and easy to scale horizontally.
Excellent performance for scenarios with millions of small files (e.g., image or video hosting).
Considerations
Limited advanced features compared with Ceph or HDFS (no native block storage, limited snapshot or erasure‑coding support).
Metadata is centralized in tracker nodes; high availability requires multiple trackers.
Typical Use Cases
Internet applications that store user‑generated media (avatars, product images, short videos).
Content delivery platforms needing fast read/write of small files.
Architect Chen
Sharing over a decade of architecture experience from Baidu, Alibaba, and Tencent.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
