An Overview of Ceph Architecture and Its Storage Interfaces
This article introduces Ceph as an open‑source distributed storage system that supports object, block, and file interfaces, explains its core RADOS components, outlines its advantages such as unified storage, high scalability, reliability and performance, and details how Ceph handles object and block storage within cloud environments.
0. Introduction
Ceph is an open‑source distributed file system that also provides block and object storage, making it a natural storage backend for cloud platforms such as OpenStack and CloudStack, or a standalone SAN/NAS solution.
1. Architecture Overview
1.1 Supported Interfaces
Object storage: radosgw, compatible with the S3 REST API for uploading and downloading files.
File system: POSIX interface, allowing a Ceph cluster to be mounted as a shared file system.
Block storage: RBD, available via kernel rbd or librbd, supporting snapshots and cloning, behaving like a regular hard‑disk.
1.2 Advantages
1.2.1 Unified Storage
Ceph unifies object, block, and file storage under a single distributed system.
1.2.2 High Scalability
Easy capacity expansion, capable of managing thousands of servers and exabyte‑scale storage.
1.2.3 Strong Reliability
Multiple strong‑consistency replicas (EC) across hosts, racks, and data centers; automatic self‑management and repair with no single point of failure.
1.2.4 High Performance
Parallel reads/writes across many replicas increase IOPS and throughput; clients interact directly with OSDs, eliminating metadata bottlenecks.
Note: The above are design advantages; actual performance depends on large‑scale testing. Recommended Ceph versions: 0.67.0, 0.80.7, 0, 94.2.
1.3 RADOS Cluster
The RADOS cluster is the storage core of Ceph, consisting of OSDs (object storage devices), MONs (monitor nodes for cluster state), and MDS (metadata servers for the CephFS). OSDs handle data read/write, verification, recovery, and heartbeat; MONs maintain consistent maps of the cluster; MDS manages only metadata.
Example status output:
# ceph -s
cluster 72d3c6b5-ea26-4af5-9a6f-7811befc6522
health HEALTH_WARN
clock skew detected on mon.mon1, mon.mon3
monmap e3: 3 mons at {mon1=10.25.25.236:6789/0,mon2=10.25.25.235:6789/0,mon3=10.25.25.238:6789/0}
election epoch 16, quorum 0,1,2 mon2,mon1,mon3
osdmap e330: 44 osds: 44 up, 44 in
pgmap v124351: 1024 pgs, 1 pools, 2432 GB data, 611 kobjects
6543 GB used, 153 TB / 160 TB avail
1024 active+clean2. What Is Object Storage?
Object storage handles unstructured data such as images, audio/video, and documents; it follows a write‑once‑many‑read pattern and uses a bucket concept where objects are stored by unique IDs.
3. What Is Block Storage?
Block storage presents disks or disk arrays as logical blocks; Ceph’s RBD provides a distributed block device (similar to SAN) that can be attached to virtual machines for high‑performance, reliable storage.
4. Ceph Component Interaction
Ceph solves connectivity, discovery, and data exchange in a distributed environment, similar to how CPU, memory, and I/O devices cooperate in a single machine.
4.1 RADOS – Component Diagram
4.2 CRUSH – Data Placement Algorithm
CRUSH maps objects to OSDs without a central lookup table. Objects belong to pools, each pool contains placement groups (PGs) that are distributed across OSDs. The algorithm takes (pool, object) as input, computes a PG number (obj mod total PGs), then runs Crush(pg, crushmap) to select target OSDs based on the hierarchical cluster map, weights, and replica rules.
4.3 Ceph Read/Write – Data Interaction
Client read/write operations go through the primary OSD and require successful writes to multiple replicas for strong consistency.
Internal cluster reads/writes involve OSD data synchronization, verification, heartbeat checks, and MON‑OSD state synchronization.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.