Fundamentals 16 min read

Unlocking Ceph: How Distributed Storage Powers Modern Cloud Infrastructures

This article explains the fundamentals of Ceph, a high‑performance, highly available and scalable distributed storage system, covering its architecture, core components, data placement algorithms, storage interfaces, and typical deployment scenarios in cloud environments.

Open Source Linux

Jun 28, 2020

Unlocking Ceph: How Distributed Storage Powers Modern Cloud Infrastructures

Ceph Overview

What is distributed storage? Imagine many servers, each with multiple disks, combined by software into a single logical storage pool. Users access this pool through a unified interface; files are split into small chunks and stored across different servers and disks, providing redundancy and fault tolerance.

Ceph is a unified, distributed file system designed for performance, reliability, and scalability. It offers file, block, and object storage from a single system and can dynamically expand. Many Chinese cloud providers use Ceph as the sole backend for OpenStack to improve data transfer efficiency.

Ceph originated from the doctoral research of Sage (first results published in 2004) and was later contributed to the open‑source community. After years of development, it is now supported by many cloud vendors; Red Hat and OpenStack integrate with Ceph for virtual‑machine image storage.

Official site: https://ceph.com/

Documentation: http://docs.ceph.org.cn/rados/

Ceph Features

High Performance

Uses the CRUSH algorithm instead of centralized metadata lookup, achieving balanced data distribution and high parallelism.

Considers fault‑domain isolation, allowing replica placement rules across rooms, racks, etc.

Scales to thousands of storage nodes, handling TB to PB‑level data.

High Availability

Replica count is flexible (typically three copies in production).

Supports fault‑domain separation and strong data consistency.

Automatically repairs various failure scenarios.

No single point of failure; the system automatically restores missing replicas.

High Scalability

Decentralized architecture.

Flexible expansion.

Linear growth as nodes are added.

Rich Features

Supports three storage interfaces: block (raw disks), file (POSIX directories), and object (key‑value storage).

Customizable interfaces and multi‑language drivers.

Ceph Application Scenarios

Ceph provides object storage, block device storage, and file system services. Its object storage can back cloud‑drive applications (e.g., ownCloud). Its block storage integrates with IaaS platforms such as OpenStack, CloudStack, ZStack, Eucalyptus, and KVM.

Ceph offers three main functions:

Object Storage (RADOSGW) : RESTful API, compatible with S3 and Swift.

Block Storage (RBD) : Provides virtual disks with built‑in disaster‑recovery.

File System (CephFS) : POSIX‑compatible network file system for high‑performance, large‑capacity storage.

What are block, object, and file system storage?

Object storage : Key‑value store with simple GET/PUT/DELETE APIs (e.g., Swift, S3).

Block storage : Exposes a block‑device interface (e.g., Linux kernel block device, QEMU driver) such as RBD, EBS, etc.

File system storage : POSIX‑compatible interface (e.g., CephFS, GlusterFS, HDFS, NFS, NAS).

Ceph Core Components

Monitors (MON) : Maintain cluster maps, provide authentication and logging.

Metadata Server (MDS) : Stores metadata for CephFS (not needed for block or object storage).

OSD (Object Storage Daemon) : Runs on each disk, stores data as objects, handles replication, recovery, back‑filling, and rebalancing.

RADOS : Reliable Autonomic Distributed Object Store, the foundation layer that stores all objects.

librados : Library offering native APIs for applications.

RADOSGW : Gateway providing S3/Swift‑compatible RESTful object storage.

RBD : Block device interface built on top of RADOS.

CephFS : POSIX‑compatible file system built on librados.

Ceph Logical Layer Structure

RADOS System Logical Structure

Ceph Data Storage Process

How a File Is Stored and Retrieved in Ceph

When a user uploads a file, Ceph splits it into equal‑sized objects. Each object is hashed and placed into a Placement Group (PG), which is then mapped to one or more OSDs.

All storage types (object, block, file) break data into objects of configurable size (typically 2 MiB or 4 MiB). Each object receives a unique OID composed of the file ID (ino) and the object number (ono).

Example: File ID A split into two objects yields OIDs A0 and A1.

Ceph Logical Mapping Layers

File → Object mapping.

Object → PG mapping using hash(oid) & mask → pgid.

PG → OSD mapping via the CRUSH algorithm.

CRUSH (Controlled Replication Under Scalable Hashing) replaces metadata tables with a deterministic algorithm that computes data placement, understands the cluster topology, and creates multiple replicas for fault tolerance, enabling self‑management and self‑healing.

RADOS Advantages Over Traditional Distributed Storage

Maps files to objects and uses CRUSH to locate data, avoiding block‑map lookups.

Leverages OSD intelligence to maximize scalability.

Ceph I/O Flow and Data Distribution

Normal I/O Flow

Steps:

Client creates a cluster handler.

Client reads the configuration file.

Client connects to monitors to obtain the cluster map.

Client issues I/O requests; CRUSH determines the primary OSD.

Primary OSD writes data to two replica OSDs.

Client waits for acknowledgments from primary and replicas.

After successful writes, the client receives completion.

New Primary I/O Flow

When a new OSD replaces a failed primary, it initially has no PG data. The former primary temporarily takes over, syncs data to the new OSD, and after synchronization the new OSD becomes primary.

Ceph Pool and PG Distribution

A pool is a logical namespace that contains a configurable number of PGs. Objects within PGs are mapped to OSDs across the cluster. Pools can be used for fault‑domain isolation based on different user scenarios.

Source: https://www.cnblogs.com/shuaiyin/p/11037909.html

Conclusion

If you found this article helpful, feel free to read it again or share it with others.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Scalable Architecture Distributed storage Ceph object-storage block storage

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.