How to Test and Tune Ceph RBD QoS: A Step‑by‑Step Guide
This article explains Ceph RBD QoS concepts, describes a full testing environment and workflow, shows command‑line configurations and fio benchmarks, and summarizes the findings that image‑level QoS limits are effective while pool‑level limits are not.
About Ceph QoS
Ceph is a highly scalable distributed storage system that has become a cornerstone of cloud computing and big‑data infrastructures. As clusters grow and workloads diversify, ensuring effective resource allocation and performance isolation becomes critical, making Ceph's Quality of Service (QoS) features especially important.
Test Environment
Operating System: Ubuntu 20.04
Kernel: 5.4.0-163-generic
CPU / Memory: 32 Cores / 128 GB
Disk: 10 TB
Ceph version: 17.2.5 Quincy (stable)
Test Process
Launch a VM that uses a Ceph RBD image
Run fio without any speed limits to obtain baseline IOPS and BPS
Enable image‑level QoS IOPS limit and test the image
Enable image‑level QoS BPS limit and test the image
Enable pool‑level QoS IOPS limit and test the pool
Enable pool‑level QoS BPS limit and test the pool
Test qemu block‑device BPS limiting
Test qemu block‑device IOPS limiting
Remove QoS settings and retest to verify restoration
Test Steps
RBD QoS has been supported since Ceph 14. Detailed configuration parameters are documented at Ceph RBD config reference . QoS is enforced on the librbd layer.
View current image QoS configuration:
<code>rbd -p libvirt-pool config image ls scan.img|grep qos</code>View pool QoS configuration (applies to the total of all images in the pool):
<code>rbd config pool ls libvirt-pool|grep qos</code>When no limits are set, both BPS and IOPS default to 0 (unlimited). The parameter
rbd_qos_schedule_tick_min=50defines a 50 ms minimum scheduling interval.
rbd_qos_write_iops_burst_seconds=1allows a one‑second burst above the IOPS limit, and the same logic applies to BPS.
IOPS : total I/O operations per second
read IOPS : read operations per second
write IOPS : write operations per second
BPS : total bytes transferred per second
read BPS : bytes read per second
write BPS : bytes written per second
Using fio for read/write testing
fio is a flexible I/O performance testing tool widely used to evaluate disk and filesystem performance. Below is a basic command that performs a 60‑second random read/write test with four parallel jobs.
<code>fio --name=randrw_test --ioengine=libaio --iodepth=1 --rw=randrw --rwmixread=50 --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting</code>Parameter explanation
--name=randrw_test: test name
--ioengine=libaio: use Linux asynchronous I/O
--iodepth=1: one I/O in flight per job
--rw=randrw: random read/write workload
--rwmixread=50: 50 % reads, 50 % writes
--bs=4k: block size of 4 KB
--direct=1: bypass page cache
--size=1G: size of the test file per job
--numjobs=4: four concurrent jobs
--runtime=60: run for 60 seconds
--group_reporting: aggregate results across jobs
Baseline test (no QoS limits) shows the raw IOPS and BPS of the RBD image.
<code>rbd perf image iostat --pool libvirt-pool</code>Enable image QoS IOPS limit (100 IOPS) and retest
<code>rbd -p libvirt-pool config image set scan.img rbd_qos_iops_limit 100</code>Run the same fio command on the VM and then check the statistics:
<code>rbd perf image iostat --pool libvirt-pool</code>The IOPS are capped at 100 as expected.
Enable image QoS BPS limit (100 KiB/s) and retest
<code>rbd -p libvirt-pool config image set scan.img rbd_qos_iops_limit 0
rbd -p libvirt-pool config image set scan.img rbd_qos_bps_limit 100000</code>After running the write‑heavy fio test, the throughput does not exceed 100 KiB/s.
Enable pool QoS IOPS limit (200 IOPS) and retest
<code>rbd -p libvirt-pool config image set scan.img rbd_qos_bps_limit 0
rbd config pool set libvirt-pool rbd_qos_iops_limit 200</code>The pool‑level IOPS limit shows little effect on the individual image performance.
Enable pool QoS BPS limit (1 MiB/s) and retest
<code>rbd config pool set libvirt-pool rbd_qos_iops_limit 0
rbd config pool set libvirt-pool rbd_qos_bps_limit 1000000</code>Similarly, the pool‑level BPS limit does not noticeably affect the image throughput.
Test qemu block‑device BPS limiting
Check the current QoS of the VM's block device (vdb):
<code>virsh blkdeviotune scan vdb</code>Limit the device to 5 MiB/s BPS:
<code>virsh blkdeviotune scan vdb --total-bytes-sec 5000000 --live</code>Test qemu block‑device IOPS limiting
<code>virsh blkdeviotune scan vdb --total-bytes-sec 0 --live
virsh blkdeviotune scan vdb --total-iops-sec 1000 --live</code>Run a heavy fio workload inside the VM:
<code>fio --name=Test --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting</code>Remove QoS settings and retest
<code>fio --name=Test --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting</code>Test Conclusion
For Ceph RBD QoS, applying limits at the image level (both IOPS and BPS) yields clear throttling effects, whereas pool‑level limits have little impact. Using qemu's blkdeviotune to limit VM block devices also works and can be applied to local disks, offering a flexible alternative to Ceph‑only QoS.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.