Cloud Computing 18 min read

Mastering Ceph: Core Architecture, Data Flow, and Easy CephFS Deployment

This article provides a comprehensive overview of Ceph's distributed storage architecture, explains the CRUSH algorithm, data placement, OSD, monitor, and MDS components, and offers step‑by‑step instructions for installing and configuring a basic CephFS cluster.

MaGe Linux Operations

Jul 7, 2016

Mastering Ceph: Core Architecture, Data Flow, and Easy CephFS Deployment

Overview

Ceph is a distributed storage system created in 2004, originally aimed at building a next‑generation high‑performance distributed file system. With the rise of cloud computing, Ceph gained popularity as a key OpenStack backend.

CRUSH algorithm

The CRUSH algorithm replaces traditional centralized metadata addressing, using consistent hashing with fault‑domain awareness to place replicas across racks, rooms, or data centers, and can scale to thousands of storage nodes.

High availability

Administrators define the number of data replicas, and CRUSH determines their physical locations to isolate failure domains, ensuring strong consistency and automatic parallel recovery.

High scalability

Ceph has no central control node; as the cluster grows, performance scales linearly with the number of disks because there is no single proxy bottleneck.

Rich features

Ceph supports three access interfaces: Object storage, Block storage, and Filesystem mount. All three can be used simultaneously, and many cloud environments use Ceph as the sole OpenStack backend.

Ceph Basic Structure

Basic components diagram

At the bottom lies RADOS, the core distributed storage layer written in C++. Clients use the native Librados API (C/C++) to communicate with the cluster via sockets.

RADOS Gateway (RGW) provides S3/Swift‑compatible RESTful APIs, while RBD offers a block‑device interface commonly used with KVM/QEMU. CEPHFS supplies a POSIX kernel‑mode filesystem mount.

Ceph Core Components

OSD

Stores all data and objects, handles replication, recovery, back‑filling, and rebalancing. Each OSD sends heartbeats and reports to monitors.

MDS (optional)

Provides metadata services for CephFS; not required unless the filesystem interface is used.

Monitor

Tracks cluster state, maintains the cluster map, and ensures data consistency across the cluster.

OSD Details

Data storage process

All data is split into objects (typically 2 MiB or 4 MiB). Each object receives a unique OID composed of a file ID (ino) and a chunk number (ono). Objects are placed into Placement Groups (PGs), which act like index buckets for efficient lookup and migration.

PG assignment is computed as pg_id = hash(oid) % num_pg. The number of PGs influences data distribution uniformity.

locator = object_name
obj_hash = hash(locator)
pg = obj_hash % num_pg
osds_for_pg = crush(pg)   # returns a list of osds
primary = osds_for_pg[0]
replicas = osds_for_pg[1:]

PGs are replicated according to the configured replica count and stored on different OSDs via CRUSH.

OSD journal

Each OSD maintains a journal (default 5 GiB) that buffers writes, similar to MySQL InnoDB logs. Using SSDs for journals improves performance.

Monitor Nodes

Monitors listen on TCP 6789, store the latest cluster map, and use the Paxos algorithm for consistency. Clients download the map, compute OSD locations via CRUSH, and communicate directly with OSDs.

Recommended Architecture

Separate public and cluster networks to balance client I/O and inter‑OSD traffic.

MDS (Metadata Server)

MDS is required only for CephFS; it caches metadata but stores it as objects on OSDs.

Simple CephFS Installation

Prepare password‑less SSH, synchronize hosts, disable firewalls, and install the ceph-deploy tool from the official repository. yum install -y ceph-deploy Create a working directory, generate a new cluster with node1 as the first monitor, and configure basic settings in ceph.conf (replica size, networks, etc.).

echo "osd pool default size = 4" >> ceph.conf
echo "osd_pool_default_min_size = 3" >> ceph.conf
echo "public network = 192.168.120.0/24" >> ceph.conf
echo "cluster network = 10.0.0.0/8" >> ceph.conf

Deploy monitors, OSDs, and MDS, create pools, and finally create the CephFS filesystem:

ceph-deploy mon create-initial
ceph-deploy osd prepare node2:/dev/sdb1 node3:/dev/sdb1 node4:/dev/sdb1
ceph-deploy osd activate node2:/dev/sdb1 node3:/dev/sdb1 node4:/dev/sdb1
ceph-deploy mds create node1
ceph osd pool create test1 256
ceph osd pool create test2 256
ceph fs new cephfs test2 test1

Verify the cluster status with ceph -s; a HEALTH_OK indicates a successful deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Computing Distributed storage Ceph RADOS CephFS CRUSH

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.