Why Ceph Is the Backbone of Modern Cloud‑Native Storage Solutions
This article traces Ceph’s evolution from its academic origins to widespread adoption in cloud‑native environments, explains the four stages of storage development, details Ceph’s unified object, block, and file interfaces, and describes its architecture—including monitors, OSDs, CRUSH algorithm, placement groups, and high‑availability features.
Ceph originated from research work during a Ph.D. at Sage, with its first results published in 2004, and was later contributed to the open‑source community. After years of development it gained support from many cloud‑computing vendors and is widely used. Red Hat and OpenStack can integrate Ceph to provide backend storage for virtual‑machine images, but in 2014, when OpenStack was booming, Ceph was not widely accepted because it was unstable (the fourth release, Dumpling v0.67) and its novel, complex architecture raised concerns about data safety and consistency in production.
As OpenStack grew rapidly, it injected new life into Ceph; more users adopted Ceph as the underlying shared storage for OpenStack, and the Ceph community in China flourished. In recent years, although OpenStack’s hype has faded, the rise of cloud‑native technologies, especially Kubernetes, has revived Ceph as a foundational storage layer for stateful workloads.
Storage Development History
Enterprise storage has evolved through four major stages:
DAS: Direct‑Attached Storage, the first‑generation system that connects external storage via SCSI or FC buses, such as a tape array, as part of a server’s expansion.
NAS: Network‑Attached Storage, which accesses shared file servers over network protocols like NFS or CIFS, exposing storage as mounted directories.
SAN: Storage Area Network, using IP‑SAN or FC‑SAN to connect to storage servers via TCP/IP or Fibre Channel, offering high performance and scalability at higher cost.
Object Storage: Designed for massive unstructured data (images, video, audio) requiring petabyte‑scale, infinitely expandable storage.
Each stage introduced solutions tailored to the era’s needs, with distinct advantages and trade‑offs.
What Is Ceph?
Ceph provides the three common enterprise storage needs—block, file, and object—in a single unified system. As the official definition states, “Ceph uniquely delivers object, block, and file storage in one unified system.” The three storage interfaces are:
1. CEPH OBJECT STORE (RGW)
RESTful interface
S3‑ and Swift‑compatible APIs
S3‑style subdomains
Unified S3/Swift namespace
User management
Usage tracking
Striped objects
Cloud solution integration
Multi‑site deployment
Multi‑site replication
2. CEPH BLOCK DEVICE (RBD)
Thin‑provisioned storage
Supports up to 16 exabytes
Configurable striping (default 4 MiB)
In‑memory caching
Snapshots
Copy‑on‑write cloning
Kernel driver support (rbd module)
KVM/libvirt integration for OpenStack, CloudStack, etc.
Backend for cloud solutions
Incremental backup
Disaster recovery with multisite asynchronous replication
3. CEPH FILE SYSTEM (CephFS)
POSIX‑compliant semantics
Metadata separated from data
Dynamic rebalancing
Subdirectory snapshots
Configurable striping
Kernel driver support
FUSE support
NFS/CIFS deployment
Hadoop integration (replace HDFS)
In plain terms, Ceph offers three storage interfaces: RBD for block storage, RGW for object storage, and CephFS for file storage, each with its own features.
Ceph Storage Architecture
Ceph uniquely delivers object, block, and file storage in a single system that is highly reliable, easy to manage, and free software. It scales to thousands of users accessing petabyte‑ to exabyte‑scale data. A Ceph cluster consists of ordinary hardware nodes running intelligent daemons. These nodes communicate to replicate data and dynamically rebalance it.
The cluster’s core components are Ceph Monitors (ceph‑mon) and Ceph OSDs (Object Storage Daemons). Monitors act as the control center, holding the cluster’s state; OSDs store the actual data. Monitors synchronize state to clients and update it when OSDs join or fail. A typical high‑availability deployment uses 2n + 1 monitors (e.g., three or five nodes).
Monitors maintain several maps:
Monitor Maps – node status (retrievable via ceph mon dump)
OSD Maps – data‑node status (via ceph osd dump)
PG Maps – placement‑group mapping (via ceph pg dump)
Crush Maps – data placement rules
MDS Maps – CephFS metadata server status (via ceph mds dump)
Other essential daemons include:
Ceph Monitors (ceph‑mon)
Ceph OSDs (ceph‑osd)
Ceph MDS (ceph‑mds) for CephFS metadata
Ceph RGW (ceph‑rgw) for object‑storage gateway
Ceph Manager (ceph‑mgr) for cluster and performance monitoring
Monitors maintain a replicated copy of the cluster map to ensure high availability; if one monitor fails, the remaining monitors keep the cluster operational.
Ceph Data Storage
All data in Ceph is stored as objects, regardless of whether it originates from block, object, or file interfaces, or a custom librados implementation. Each object is stored on an Object Storage Device (OSD) managed by the OSD daemon.
When a client writes data, the object is hashed to determine a placement‑group ID (PGID). The CRUSH algorithm then maps the PG to suitable OSD nodes, ensuring replicas are placed on different OSDs. Using PGs reduces the scheduling load compared to per‑object placement.
Key steps in the write path:
A file is split into multiple objects (e.g., a 100 GiB file becomes 25 600 objects of 4 MiB each).
Each object receives a unique OID (inode + object number).
The OID is hashed and masked to obtain a PGID.
CRUSH computes the optimal OSD(s) for the PG, typically creating multiple replicas.
The PG’s data is written to the selected OSD(s), completing the write operation.
Future sections will explore Ceph’s scalability, high availability, dynamic cluster management, erasure coding, tiered caching, and deeper details of encoding blocks and recovery.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
