Ceph Performance Optimization: Methodology, Hardware and Software Tuning Guide
This article summarizes practical methodologies and detailed hardware and software tuning steps—including CPU, memory, network, SSD selection, BIOS settings, Linux kernel parameters, Ceph configuration options, PG calculation, and CRUSH map adjustments—to improve Ceph distributed storage performance.
The author shares a comprehensive summary of Ceph storage optimization and testing methods, acknowledging that most content is compiled from public sources and inviting feedback for improvement.
Optimization Methodology – Effective optimization requires a clear methodology, covering both hardware and software layers.
Hardware Layer
Hardware planning (processor, memory, network)
SSD selection
BIOS settings
Hardware Planning
Each Ceph OSD process should be bound to a dedicated CPU core; MON processes need less CPU, while MDS processes are CPU‑intensive. Recommended memory: 2 GB for MON/MDs and 1 GB (2 GB preferred) for each OSD. A 10 GbE network is essential, with separate client and cluster networks if possible.
SSD Selection
SATA SSDs (e.g., Intel® SSD DC S3500) are commonly used for journals; PCIe SSDs offer higher performance but may not improve latency as expected. Refer to Sébastien Han’s guide for SSD journal suitability.
BIOS Settings
Enable Hyper‑Threading (HT) and VT.
Disable power‑saving modes and set the profile to Performance.
Disable NUMA (or bind OSD processes to specific CPUs and memory nodes). Example command to set CPU governor:
for CPUFREQ in /sys/devices/system*/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || continue; echo -n performance > $CPUFREQ; done
To turn off NUMA on CentOS, add numa=off to the kernel line in /etc/grub.conf .
Software Layer
Linux OS Tuning
Increase max PID: echo 4194303 > /proc/sys/kernel/pid_max
Enable jumbo frames: ifconfig eth0 mtu 9000 and persist with echo "MTU=9000" >> /etc/sysconfig/network-scripts/ifcfg-eth0
Set read_ahead (recommended 8192 KB): echo "8192" > /sys/block/sda/queue/read_ahead_kb
Disable swap usage: echo "vm.swappiness = 0" >> /etc/sysctl.conf
Choose I/O scheduler (noop for SSD, deadline for HDD): echo "noop" > /sys/block/sdX/queue/scheduler
Use cgroups to bind OSD processes to specific CPUs and memory nodes for better cache utilization and reduced NUMA impact.
Ceph Configuration
Key sections of ceph.conf include:
[global] – basic cluster settings (fsid, mon hosts, authentication, network).
[osd] – filestore – parameters like filestore xattr use omap = true , queue limits, and sync intervals.
[osd] – journal – journal size and queue settings.
[osd] – osd config tuning – thread counts (e.g., osd op threads = 8 , osd disk threads = 4 ).
[osd] – recovery tuning and client tuning – adjust recovery priority and client cache settings.
Disable debug logging to reduce overhead.
PG Number
Calculate placement groups (PG) based on OSD count: Total PGs = (Total_number_of_OSD * 100) / max_replication_count . Choose the nearest power of two (e.g., 512 for 15 OSDs with replication factor 3).
CRUSH Map
CRUSH map adjustments depend on the specific deployment and require case‑by‑case analysis.
Other Factors
Performance can be impacted by a single slow disk; monitor OSD latency with ceph osd perf and consider removing underperforming OSDs.
Sample ceph.conf
(Full configuration excerpt omitted for brevity; includes global, mon, osd, and client sections with tuned parameters.)
Conclusion
Optimization is an ongoing iterative process; the methods presented are collected from various sources and should be continuously refined through practical experience.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.