Operations 15 min read

Why Ceph’s Unlimited Scalability Isn’t As Simple As It Looks

The article examines Ceph’s claimed infinite scalability, cost advantages, and operational stability from an SRE perspective, comparing it with centralized systems like HDFS, and reveals practical challenges such as expansion granularity, crushmap rebalancing, utilization limits, and maintenance overhead.

dbaplus Community
dbaplus Community
dbaplus Community
Why Ceph’s Unlimited Scalability Isn’t As Simple As It Looks

Scalability

Ceph advertises unlimited scalability thanks to its CRUSH algorithm and lack of central metadata nodes. In practice, objects are first hashed into placement groups (PGs) and then mapped to OSDs based on the cluster’s crushmap. This two‑stage hashing introduces expansion granularity limits: a single expansion can only add machines within one fault‑domain (host, rack, etc.). If the fault‑domain is set too narrowly, adding a few hosts may cause PGs to lose the required number of complete replicas, temporarily halting service.

To increase granularity, operators can define larger fault‑domains (e.g., logical racks) so that an expansion adds an entire domain at once, mitigating the replica‑loss issue.

During expansion, the crushmap may change if a disk fails mid‑process, forcing PGs to re‑hash and potentially prolonging the rebalancing period. Large clusters experience frequent disk failures, causing repeated crushmap adjustments, IO degradation, and occasional temporary data unavailability.

Ceph’s PG count also needs careful planning. While a higher PG count improves data distribution randomness, it consumes CPU, memory, and network resources. Community guidance suggests around 200 PG per OSD, and administrators must adjust PG numbers as the cluster grows to avoid performance penalties.

Cost

Compared with commercial storage solutions (EMC, IBM, cloud services), building a private Ceph cluster can reduce hardware costs, but total cost of ownership includes hardware, personnel, and service quality. Ceph’s decentralized design leads to uneven disk utilization; once cluster usage reaches roughly 70 % the system can become unstable, requiring manual reweighting of over‑utilized disks.

Because Ceph cannot guarantee that new writes land on non‑full disks, it refuses service when a disk reaches capacity, unlike HDFS which can direct writes to less‑used nodes. This behavior forces operators to provision extra capacity well before reaching 70 % utilization, leading to potential waste of hardware and power.

Stability and Operations

Stable operation depends heavily on team expertise and documentation quality. Red Hat’s curated Ceph documentation improves usability, but administrators still face challenges such as frequent reweighting cycles, monitoring for disk‑full conditions, and handling “stuck” PGs during rebalancing.

When a cluster approaches 80 % utilization, disks may fill rapidly, prompting repeated manual interventions that increase operational overhead and risk of human error. Early capacity alerts and proactive procurement can mitigate this, but they also risk over‑provisioning.

Ceph lacks built‑in metadata like last_access_time, making hot/cold data tiering and garbage collection require custom development, adding further complexity for large deployments.

Conclusion

Ceph offers theoretical infinite scalability, but practical expansion is constrained by fault‑domain granularity, crushmap volatility, and PG management. Cost savings are offset by uneven utilization and the need for diligent capacity planning. Ultimately, the technology’s trade‑offs mean that no single storage solution is universally superior; the choice must align with specific workload requirements and operational capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsScalabilitySREdistributed storageHDFSCeph
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.