How to Fix Ceph Nearfull Warnings and Master PG/OSD Management
This guide explains why Ceph reports nearfull OSD warnings, how to adjust monitor thresholds, automate and manually reweight OSDs, interpret PG and OSD states, and perform essential cluster operations such as adding/removing OSDs, managing pools, users, and monitors using the appropriate ceph commands.
Common Questions
When "nearfull osd(s) or pool(s) nearfull" appears, it means some OSDs have exceeded the configured threshold. Monitors watch OSD space usage. Raising the thresholds via configuration does not always solve the warning; analyzing OSD data distribution is more effective.
Configuration file thresholds
"mon_osd_full_ratio":"0.95",
"mon_osd_nearfull_ratio":"0.85"Automatic handling
ceph osd reweight --by-utilization
ceph osd reweight -by-pg 105 cephfs_data(pool_name)Manual handling
ceph osd reweight osd.2 0.8Global handling
ceph mgr module ls
ceph mgr module enable balancer
ceph balancer on
ceph balancer mode crush-compat
ceph config-key set "mgr/balancer/max_misplaced" "0.01"PG fault states
PG state overview A PG can be in various states during its lifecycle:
Creating – PG is being created when a pool is defined.
Peering – OSDs establish communication and reach consensus for objects.
Active – Data is fully stored and peering is complete.
Clean – All replicas are in sync and no stray PGs exist.
Degraded – Replicas are missing or an OSD is down.
Recovering – A down OSD comes back and data is being restored.
Backfilling – A new OSD joins and receives part of the data.
Remapped – Acting set changes and PG is migrating.
Stale – Monitor has not received recent reports from the acting set.
OSD states
Each OSD has two status dimensions: in/out indicates membership in the cluster, up/down indicates daemon health. They are not mutually exclusive.
in & up – normal, OSD is part of the cluster and running.
in & down – OSD is in the cluster but daemon is down; after 300 s it becomes out & down.
out & up – newly added OSD, daemon running but not yet in the cluster.
out & down – OSD removed from cluster and daemon not running; CRUSH will not place PGs on it.
Cluster monitoring and management
Overall cluster status can be inspected with commands such as:
# ceph -s
cluster:
id: 8230a918-a0de-4784-9ab8-cd2a2b8671d0
health: HEALTH_WARN
services:
mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h)
mgr: cephnode01 (active, since 53m), standbys: cephnode03, cephnode02
osd: 4 osds: 4 up (since 27h), 4 in (since 19h)
rgw: 1 daemon active (cephnode01)
data:
pools: 6 pools, 96 pgs
objects: 235 objects, 3.6KiB
usage: 4.0GiB used, 56GiB/60GiB avail
pgs: 96 active+cleanAdditional useful commands:
# ceph -w
# ceph health detail
# ceph pg dump
# ceph pg stat
# ceph osd pool stats
# ceph osd stat
# ceph osd dump
# ceph osd tree
# ceph osd df
# ceph mon stat
# ceph mon dump
# ceph quorum_status
# ceph df
# ceph df detailCluster configuration management (temporary and global, smooth service restart)
To view or modify a daemon's configuration without restarting the service, use the tell and daemon sub‑commands.
# ceph daemon {daemon-type}.{id} config show
# ceph daemon osd.0 config showTell command format
The tell command applies settings to the whole cluster (using * as a wildcard). Errors are reported directly on the command line.
# ceph tell {daemon-type}.{daemon id or *} injectargs --{name}={value} [--{name}={value}]
# ceph tell osd.0 injectargs --debug-osd 20 --debug-ms 1Parameters:
daemon-type : osd, mon, mds, etc.
daemon id : numeric ID for OSD, monitor name for mon, or * for all.
injectargs : injects one or more arguments.
Daemon command
The daemon sub‑command sets configuration on a single daemon, providing immediate feedback.
# ceph daemon {daemon-type}.{id} config set {name}={value}
# ceph daemon mon.ceph-monitor-1 config set mon_allow_pool_delete falseCluster operations
# systemctl start ceph.target
# systemctl start ceph-mgr.target
# systemctl start ceph-osd@id
# systemctl start ceph-mon.target
# systemctl start ceph-mds.target
# systemctl start ceph-radosgw.targetAdding and removing OSDs
Adding
# ceph-volume lvm zap /dev/sd<id>
# ceph-deploy osd create --data /dev/sd<id> $hostnameRemoving
# ceph osd crush reweight osd.<ID> 0.0
# systemctl stop ceph-osd@<ID>
# ceph osd out <ID>
# ceph osd purge osd.<ID> --yes-i-really-mean-it
# umount /var/lib/ceph/osd/ceph-?Expanding PGs
ceph osd pool set {pool-name} pg_num 128
ceph osd pool set {pool-name} pgp_num 128Note: PG and PGP numbers should be powers of two and kept equal to allow proper rebalancing.
Pool operations
# ceph osd lspools
# ceph osd pool create {pool-name} {pg-num} [{pgp-num}]
# ceph osd pool set-quota {pool-name} max_objects 10000
# ceph osd pool delete {pool-name} {pool-name} --yes-i-really-mean-it
# ceph osd pool rename {current-pool-name} {new-pool-name}
# rados df
# ceph osd pool mksnap {pool-name} {snap-name}
# ceph osd pool rmsnap {pool-name} {snap-name}
# ceph osd pool get {pool-name} {key}
# ceph osd pool set {pool-name} {key} {value}
# ceph osd dump | grep 'replicated size'User management
# ceph auth list
# ceph auth get client.admin
# ceph auth print-key client.admin
# ceph auth add client.john mon 'allow r' osd 'allow rw pool=liverpool'
# ceph auth get-or-create client.paul mon 'allow r' osd 'allow rw pool=liverpool'
# ceph auth caps client.john mon 'allow r' osd 'allow rw pool=liverpool'
# ceph auth del {TYPE}.{ID}Adding and removing Monitors
# ceph-deploy mon create $hostname
# ceph-deploy mon destroy $hostnameIt is recommended to run an odd number of monitors (at least three in production) to maintain quorum.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
