Master Ceph: Complete Guide to Deploying and Managing a Production-Ready Distributed Storage Cluster
This comprehensive guide explains why Ceph is a leading software‑defined storage solution, details hardware and network design, walks through step‑by‑step deployment with cephadm, covers pool creation, monitoring, performance tuning, troubleshooting, scaling, backup, security hardening, and advanced automation for production environments.
Why Choose Ceph?
Ceph is a unified, open‑source storage platform that provides object (RADOS Gateway), block (RBD), and POSIX‑compatible file system (CephFS) services. Its core advantages include true decentralization with no single point of failure, seamless horizontal scaling to petabyte levels, automatic self‑healing, and a vibrant community that avoids vendor lock‑in.
Hardware Recommendations
Monitor nodes (≥3, odd number)
CPU: 4+ cores
Memory: 8GB+
Disk: 100GB SSD (OS)
Network: Dual 10GbE (redundant)OSD nodes (≥6 for a starter cluster)
CPU: 1 core per OSD
Memory: 4GB per OSD (BlueStore)
Disk: Enterprise SSD or high‑rpm HDD
Network: Dual 10GbE (public + cluster)MGR nodes (≥2)
CPU: 2 cores
Memory: 4GB
Disk: System disk onlyNetwork Architecture Design
Separate client traffic from internal cluster traffic to prevent congestion.
# Public network (client access)
10.0.1.0/24
# Cluster network (data replication & heartbeat)
10.0.2.0/24Step‑by‑Step Deployment
Environment Preparation
# 1. System version (CentOS 8 example)
cat /etc/os-release
# 2. Time synchronization (critical)
systemctl enable --now chronyd
chrony sources -v
# 3. Firewall configuration
firewall-cmd --zone=public --add-port=6789/tcp --permanent
firewall-cmd --zone=public --add-port=6800-7300/tcp --permanent
firewall-cmd --reload
# 4. SELinux disable
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/configInstall cephadm Tool
# Install official binary
curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
chmod +x cephadm
./cephadm add-repo --release octopus
./cephadm installBootstrap the Cluster
# Initialize the first monitor
cephadm bootstrap --mon-ip 10.0.1.10 --cluster-network 10.0.2.0/24
# Install Ceph CLI tools
cephadm install ceph-common
# Verify cluster status
ceph statusSuccessful bootstrap shows output similar to:
cluster:
id: a7f64266-0894-4f1e-a635-d0aeaca0e993
health: HEALTH_OKAdd OSD Nodes
# 1. Distribute SSH keys
ssh-copy-id root@node2
ssh-copy-id root@node3
# 2. Register hosts
ceph orch host add node2 10.0.1.11
ceph orch host add node3 10.0.1.12
# 3. List available disks
ceph orch device ls
# 4. Deploy OSD daemons
ceph orch daemon add osd node2:/dev/sdb
ceph orch daemon add osd node2:/dev/sdc
ceph orch daemon add osd node3:/dev/sdb
ceph orch daemon add osd node3:/dev/sdcCreate Storage Pools
# 1. Create a replicated pool (3 replicas)
ceph osd pool create mypool 128 128 replicated
# 2. Enable RBD application type
ceph osd pool application enable mypool rbd
# 3. Set CRUSH rule for rack‑level fault tolerance
ceph osd crush rule create-replicated rack_rule default rack
ceph osd pool set mypool crush_rule rack_ruleMonitoring and Performance Tuning
Key Monitoring Commands
# Cluster health details
ceph health detail
# Storage usage
ceph df
# OSD performance stats
ceph osd perf
# Slow request monitoring
ceph osd slow-requests
# Placement Group status
ceph pg statOptimization Parameters (in /etc/ceph/ceph.conf )
[global]
# Network tuning
ms_bind_port_max = 7300
ms_bind_port_min = 6800
# OSD tuning
osd_max_write_size = 512
osd_client_message_size_cap = 2147483648
osd_deep_scrub_interval = 2419200
osd_scrub_max_interval = 604800
# BlueStore tuning
bluestore_cache_size_hdd = 4294967296
bluestore_cache_size_ssd = 8589934592
# Recovery control
osd_recovery_max_active = 5
osd_max_backfills = 2
osd_recovery_op_priority = 2Troubleshooting Cases
Case 1 – OSD Down
# Check health details
ceph health detail
# Locate down OSD
ceph osd tree | grep down
# Inspect OSD logs
journalctl -u ceph-osd@3 -f
# Restart OSD
systemctl restart ceph-osd@3
# If hardware failure, mark out and replace
ceph osd out 3Case 2 – Inconsistent PG
# Find inconsistent PGs
ceph pg dump | grep inconsistent
# Repair the PG
ceph pg repair 2.3f
# Deep scrub for thorough cleanup
ceph pg deep-scrub 2.3fCase 3 – Disk Space Exhaustion
# Check usage details
ceph df detail
# Identify the largest pools
ceph osd pool ls detail
# Temporarily raise alert thresholds
ceph config set global mon_osd_full_ratio 0.95
ceph config set global mon_osd_backfillfull_ratio 0.90
ceph config set global mon_osd_nearfull_ratio 0.85
# Long‑term fix: add OSDs or purge data
ceph orch daemon add osd node4:/dev/sdbCapacity Planning & Expansion
Capacity Formula
Usable Capacity = Raw Capacity × (1 - ReplicationFactor/ReplicationFactor) × (1 - ReservedRatio)
# Example: 100 TB raw, 3‑replica, 10 % reserve → 30 TB usableSmooth Expansion Procedure
# 1. Limit backfills before adding new OSDs
ceph config set global osd_max_backfills 1
ceph config set global osd_recovery_max_active 1
# 2. Add OSDs one by one
ceph orch daemon add osd node5:/dev/sdb
# Wait for data rebalance
ceph -w
# 3. Restore default settings
ceph config rm global osd_max_backfills
ceph config rm global osd_recovery_max_activeBackup & Disaster Recovery
RBD Snapshot Backup
# Create snapshot
rbd snap create mypool/myimage@snapshot1
# Export snapshot
rbd export mypool/myimage@snapshot1 /backup/myimage.snapshot1
# Enable cross‑cluster mirroring
rbd mirror pool enable mypool image
rbd mirror image enable mypool/myimageCluster‑Level Backup
# Export configuration
ceph config dump > /backup/ceph-config.dump
# Backup CRUSH map
ceph osd getcrushmap -o /backup/crushmap.bin
# Backup monitor data
ceph-mon --extract-monmap /backup/monmapAdvanced Operations
Automation Scripts
#!/bin/bash
# ceph-health-check.sh
LOG_FILE="/var/log/ceph-health.log"
ALERT_EMAIL="[email protected]"
check_health() {
HEALTH=$(ceph health --format json | jq -r '.status')
if [ "$HEALTH" != "HEALTH_OK" ]; then
echo "$(date): Cluster health is $HEALTH" >> $LOG_FILE
ceph health detail >> $LOG_FILE
echo "Ceph cluster health issue detected" | mail -s "Ceph Alert" $ALERT_EMAIL
fi
}
check_capacity() {
USAGE=$(ceph df --format json | jq -r '.stats.total_used_ratio')
THRESHOLD=0.80
if (( $(echo "$USAGE > $THRESHOLD" | bc -l) )); then
echo "$(date): Storage usage is $USAGE" >> $LOG_FILE
echo "Storage capacity warning" | mail -s "Ceph Capacity Alert" $ALERT_EMAIL
fi
}
main() { check_health; check_capacity; }
mainPerformance Benchmarks
# RADOS benchmark
rados bench -p mypool 60 write --no-cleanup
rados bench -p mypool 60 seq
rados bench -p mypool 60 rand
# RBD benchmark
rbd create --size 10G mypool/test-image
rbd map mypool/test-image
fio --name=rbd-test --rw=randwrite --bs=4k --size=1G --filename=/dev/rbd0
# CephFS benchmark
mkdir /mnt/cephfs/test
fio --name=cephfs-test --rw=write --bs=1M --size=1G --directory=/mnt/cephfs/testSecurity Hardening
# Enable authentication
ceph config set mon auth_cluster_required cephx
ceph config set mon auth_service_required cephx
ceph config set mon auth_client_required cephx
# Create a dedicated backup user
ceph auth get-or-create client.backup mon 'allow r' osd 'allow rwx pool=mypool'
# Enable encrypted network traffic
ceph config set global ms_cluster_mode secure
ceph config set global ms_service_mode secureLog Management
# Log rotation configuration (/etc/logrotate.d/ceph)
/var/log/ceph/*.log {
daily
rotate 30
compress
sharedscripts
postrotate
systemctl reload ceph.target
endscript
}
# Adjust log verbosity
ceph config set global debug_osd 1/5
ceph config set global debug_mon 1/5Upgrade Strategy
# Pre‑upgrade health check
ceph status
ceph versions
# Perform rolling upgrade of OSDs
ceph orch upgrade start --ceph-version 15.2.14
# Monitor upgrade progress
ceph orch upgrade statusKey Takeaways
Architecture Design : Proper hardware selection and network segregation are fundamental to a stable Ceph cluster.
Monitoring & Operations : Continuous health checks, metric collection, and alerting prevent issues before they impact services.
Performance Tuning : Adjusting OSD, BlueStore, and recovery parameters tailors the cluster to specific workloads.
Fault Handling : Rapid diagnosis using health detail, OSD tree, and log inspection is essential for high availability.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
