Operations 15 min read

How to Optimize Ceph Cluster Hardware: CPU, RAM, Storage & Network Guidelines

This guide explains how to plan Ceph hardware by balancing CPU, memory, storage, network bandwidth, and failure domains, offering practical recommendations for daemons, OSDs, monitors, managers, and SSD vs HDD choices to achieve cost‑effective, high‑performance large‑scale clusters.

MaGe Linux Operations

May 26, 2020

How to Optimize Ceph Cluster Hardware: CPU, RAM, Storage & Network Guidelines

Ceph is designed to run on commodity hardware, making ultra‑large clusters economically feasible. When planning hardware you must balance failure domains and performance considerations, distributing Ceph daemons and other processes across many hosts. Dedicated hosts should run specific Ceph daemons, while separate hosts handle workloads such as OpenStack or CloudStack.

CPU

Ceph metadata servers dynamically rebalance load, so they need strong CPUs (four cores or more). OSDs run RADOS services and should have reasonable processing power (e.g., dual‑core). Monitors are not CPU‑intensive. Avoid placing CPU‑heavy workloads (e.g., OpenStack Nova) on the same hosts as Ceph daemons; run them on separate machines.

RAM

More RAM is better. Memory usage for monitors and managers grows with cluster size: 1‑2 GB for small clusters, 5‑10 GB for large ones. Adjust settings such as mon_osd_cache_size or rocksdb_cache_size as needed.

OSDs (ceph‑osd)

BlueStore uses its own memory for caching instead of the OS page cache. The osd_memory_target can be tuned; values below 2 GB are discouraged, 2‑4 GB is typical, and >4 GB may improve performance for large datasets.

Important: OSD memory auto‑adjustment is “best‑effort”. The kernel may not reclaim memory immediately, especially on older Ceph versions with transparent huge pages. Reserve ~20 % extra RAM to avoid OOM during spikes.

With FileStore, page cache handles data, and OSD memory consumption mainly depends on the number of PGs per daemon.

Data Storage

Plan storage carefully; simultaneous reads/writes from multiple daemons on a single drive can degrade performance. Use separate drives for the OS, OSD data, and OSD journals. Minimum HDD size is 1 TB; larger disks reduce cost per GB. Do not run multiple OSDs on the same disk, and avoid co‑locating OSDs with monitors or metadata servers.

Hard Disk Drives

Consider cost per GB when choosing disk sizes (e.g., $0.07/GB for 1 TB vs $0.05/GB for 3 TB). Use one dedicated drive per OSD daemon to avoid “slow OSD” issues.

Solid State Drives

SSDs dramatically reduce latency and increase throughput, though they are more expensive per GB. Use SSDs for OSD journals and metadata; keep object data on HDDs for cost efficiency. Ensure SSDs meet write‑intensive performance, sequential write capabilities, and proper partition alignment.

When using SSDs, verify IOPS and write performance match or exceed HDDs, and store OSD journals on SSDs while keeping object data on separate HDDs.

Controllers

Disk controllers affect write throughput; choose wisely to avoid bottlenecks. Refer to Ceph blog posts on disk‑controller performance for details.

Additional Considerations

You can run multiple OSDs per host, but total OSD throughput must not exceed the host’s network bandwidth. Ensure the kernel is up‑to‑date and supports required features (e.g., glibc, syncfs).

NETWORKS

Start with 10 Gbps+ networking. A 1 Gbps network would take 3 hours to copy 1 TB, while 10 Gbps reduces it to ~20 minutes. Use VLANs (802.1q) for easier management and consider out‑of‑band (BMC) networks for management traffic. Deploy three separate networks only after evaluating capacity and performance trade‑offs.

FAILURE DOMAINS

A failure domain is any event that stops one or more OSDs (e.g., daemon crash, disk failure, power loss). Balance the number of OSDs per failure domain to minimize cost while protecting against data loss.

MINIMUM HARDWARE RECOMMENDATIONS

Ceph can run on inexpensive commodity hardware. Small production or development clusters can succeed with modest specs.

Key recommendations per daemon type:

ceph‑osd : at least 1 CPU core; 200‑500 MB/s and 1000‑3000 IOPS per core; more cores for ARM; 4 GB+ RAM per daemon (2‑4 GB works but may be slow).

ceph‑mon : at least 1 CPU core; 2 GB+ RAM per daemon.

ceph‑mds : at least 1 CPU core; 2 GB+ RAM per daemon.

Each daemon should have a dedicated disk (SSD for journal/DB/WAL, HDD for object data) and at least a gigabit network interface (10 Gbps recommended).

Tips: If you run OSDs on a single disk, create a separate partition for volume storage distinct from the OS partition. Generally, keep OS and volume storage on separate disks.
BlueStore consumes more memory than FileStore.
https://ceph.io/releases/v12-2-10-luminous-released/
osd_memory_target default is 4294967296 Bytes (~4.2 GB)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Ops storage architecture Ceph Cluster Performance Hardware Planning

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.