Fundamentals 8 min read

Understanding Ceph Architecture: RADOS, OSD, PG Mapping and Data Placement

This article explains Ceph's distributed storage architecture, covering its origins, RADOS client interactions, cluster map updates, the roles of OSDs, Monitors, metadata clusters, and the three-step mapping process from files to objects, placement groups, and finally to storage devices using the CRUSH algorithm.

Architects' Tech Alliance

Oct 21, 2018

Understanding Ceph Architecture: RADOS, OSD, PG Mapping and Data Placement

Ceph, initiated in 2004, is a unified distributed storage system designed for high performance, reliability, and scalability.

Clients interact with the RADOS system by obtaining a ClusterMap from OSDs or Monitors, performing local calculations to locate objects, and communicating directly with the appropriate OSDs, eliminating the need for a separate metadata server as long as the ClusterMap remains stable.

ClusterMap updates occur only when OSDs fail or the cluster expands, events that are far less frequent than normal data accesses.

OSDs rely on underlying filesystem xattrs for object state and metadata; ext4 provides only 4 KB, XFS 64 KB, while Btrfs has no limit but is less stable, making XFS a recommended choice for production.

The Ceph logical architecture includes Clients (data slicing and CRUSH-based object location), OSDs (data storage, replication, recovery, and reporting), Monitors (cluster state monitoring and mapping), and optional Metadata Clusters for CephFS.

A cluster is divided into Pools, each containing multiple Placement Groups (PGs); objects are split from files, mapped to PGs, and PGs are then mapped to a set of OSDs via the CRUSH algorithm, enabling dynamic object-to-OSD placement and simplifying data distribution.

Without PGs, OSDs would need to exchange information for millions of objects, leading to prohibitive maintenance overhead; PGs reduce this by grouping objects and limiting inter‑OSD communication.

The data addressing process involves three mappings: File → Object (splitting files into fixed‑size objects), Object → PG (hashing object IDs to PG IDs), and PG → OSD (using CRUSH to select N OSDs for each PG, typically with at least two replicas).

Proper sizing of PGs and sufficient numbers of OSDs (tens to hundreds) are crucial for balanced data distribution and system performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed storage Ceph OSD Data Mapping RADOS CRUSH Placement Group

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.