Fundamentals 40 min read

Red Hat Ceph Storage Architecture Guide – Overview and Core Concepts

This article provides a comprehensive overview of Red Hat Ceph's distributed object storage architecture, covering storage pools, CRUSH placement, authentication, I/O workflows, internal operations, client interfaces, data striping, erasure coding, high availability, and encryption mechanisms for secure, scalable deployments.

Architects' Tech Alliance

May 15, 2019

Red Hat Ceph Storage Architecture Guide – Overview and Core Concepts

Chapter 1 Overview

Red Hat Ceph is a distributed object storage system designed for performance, reliability and scalability, supporting multiple client interfaces such as native language bindings (C/C++, Java, Python), RESTful S3/Swift, block device and file system.

It can scale to thousands of clients and petabyte‑to‑exabyte data volumes, with core components being Ceph OSD daemons (handling data replication, rebalancing, recovery, monitoring) and Ceph monitors (maintaining cluster maps).

Chapter 2 Storage Cluster Architecture

2.1 Storage Pools

Storage pools logically partition data and can be configured for replicated or erasure‑coded durability. Pools define the type (replicated or EC), placement groups (PGs) and CRUSH rule sets that control data placement, fault domains and performance domains.

2.2 Authentication (CephX)

CephX provides mutual authentication using shared secret keys, similar to Kerberos, without encrypting data in transit.

2.3 Placement Groups (PGs)

Objects are mapped to PGs, which are then mapped to an acting set of OSDs via the CRUSH algorithm, enabling dynamic rebalancing and high scalability.

2.4 CRUSH

CRUSH deterministically maps PGs to OSDs based on hierarchical bucket definitions, allowing placement across fault and performance domains.

2.5 I/O Operations

Clients obtain the latest cluster map from monitors, compute the target PG and primary OSD using CRUSH, and interact directly with the primary OSD for reads and writes.

2.5.1 Replicated I/O

The primary OSD writes data and forwards it to replica OSDs; acknowledgments from replicas confirm successful writes.

2.5.2 Erasure‑coded I/O

Data is split into K data blocks and M coding blocks; the primary OSD encodes and distributes blocks across OSDs, allowing reconstruction if up to M OSDs fail.

2.6 Internal Operations

2.6.1 Heartbeat

OSDs report up/down status to monitors, which ping OSDs to verify liveness.

2.6.2 Sync

OSDs synchronize PG state internally without manual intervention.

2.6.3 Data Rebalancing and Recovery

When OSDs are added or fail, CRUSH recalculates placement and only a fraction of data moves, ensuring balanced load.

2.6.4 Scrubbing

Periodic scrubbing validates object metadata and data integrity, detecting corruption.

2.7 High Availability

Ceph maintains data availability through multiple replicas, monitor quorum, and the CephX authentication mechanism.

Chapter 3 Client Architecture

3.1 Native Protocol and Librados

Provides direct, parallel object access with operations such as pool management, snapshots, read/write, xattr and key/value handling.

3.2 Object Watch/Notify

Clients can register watches on objects and receive notifications for changes.

3.3 Exclusive Locks

Exclusive locks prevent concurrent writes to the same RBD image, improving consistency.

3.4 Object Map Index

Tracks existence of RADOS objects to avoid unnecessary operations on non‑existent objects.

3.5 Data Striping

Striping splits data across multiple objects to increase throughput; parameters include object size, stripe unit, and stripe count.

rbd -p mypool create myimage --size 102400 --image-features 5

rbd -p mypool create myimage --size 102400 --image-features 13

Chapter 4 Encryption

Ceph can use LUKS disk encryption for OSD data and journal partitions, managed by ceph‑ansible and stored securely in monitor key‑value stores.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed storage erasure-coding Ceph object-storage CRUSH

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.