Operations 14 min read

How to Test and Tune Ceph RBD QoS: A Step‑by‑Step Guide

This article explains Ceph RBD QoS concepts, describes a full testing environment and workflow, shows command‑line configurations and fio benchmarks, and summarizes the findings that image‑level QoS limits are effective while pool‑level limits are not.

Ops Development Stories
Ops Development Stories
Ops Development Stories
How to Test and Tune Ceph RBD QoS: A Step‑by‑Step Guide

About Ceph QoS

Ceph is a highly scalable distributed storage system that has become a cornerstone of cloud computing and big‑data infrastructures. As clusters grow and workloads diversify, ensuring effective resource allocation and performance isolation becomes critical, making Ceph's Quality of Service (QoS) features especially important.

Test Environment

Operating System: Ubuntu 20.04

Kernel: 5.4.0-163-generic

CPU / Memory: 32 Cores / 128 GB

Disk: 10 TB

Ceph version: 17.2.5 Quincy (stable)

Test Process

Launch a VM that uses a Ceph RBD image

Run fio without any speed limits to obtain baseline IOPS and BPS

Enable image‑level QoS IOPS limit and test the image

Enable image‑level QoS BPS limit and test the image

Enable pool‑level QoS IOPS limit and test the pool

Enable pool‑level QoS BPS limit and test the pool

Test qemu block‑device BPS limiting

Test qemu block‑device IOPS limiting

Remove QoS settings and retest to verify restoration

Test Steps

RBD QoS has been supported since Ceph 14. Detailed configuration parameters are documented at Ceph RBD config reference . QoS is enforced on the librbd layer.

View current image QoS configuration:

rbd -p libvirt-pool config image ls scan.img|grep qos

View pool QoS configuration (applies to the total of all images in the pool): rbd config pool ls libvirt-pool|grep qos When no limits are set, both BPS and IOPS default to 0 (unlimited). The parameter rbd_qos_schedule_tick_min=50 defines a 50 ms minimum scheduling interval. rbd_qos_write_iops_burst_seconds=1 allows a one‑second burst above the IOPS limit, and the same logic applies to BPS.

IOPS : total I/O operations per second

read IOPS : read operations per second

write IOPS : write operations per second

BPS : total bytes transferred per second

read BPS : bytes read per second

write BPS : bytes written per second

Using fio for read/write testing

fio is a flexible I/O performance testing tool widely used to evaluate disk and filesystem performance. Below is a basic command that performs a 60‑second random read/write test with four parallel jobs.

fio --name=randrw_test --ioengine=libaio --iodepth=1 --rw=randrw --rwmixread=50 --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting

Parameter explanation

--name=randrw_test: test name

--ioengine=libaio: use Linux asynchronous I/O

--iodepth=1: one I/O in flight per job

--rw=randrw: random read/write workload

--rwmixread=50: 50 % reads, 50 % writes

--bs=4k: block size of 4 KB

--direct=1: bypass page cache

--size=1G: size of the test file per job

--numjobs=4: four concurrent jobs

--runtime=60: run for 60 seconds

--group_reporting: aggregate results across jobs

Baseline test (no QoS limits) shows the raw IOPS and BPS of the RBD image.

rbd perf image iostat --pool libvirt-pool

Enable image QoS IOPS limit (100 IOPS) and retest

rbd -p libvirt-pool config image set scan.img rbd_qos_iops_limit 100

Run the same fio command on the VM and then check the statistics: rbd perf image iostat --pool libvirt-pool The IOPS are capped at 100 as expected.

Enable image QoS BPS limit (100 KiB/s) and retest

rbd -p libvirt-pool config image set scan.img rbd_qos_iops_limit 0
rbd -p libvirt-pool config image set scan.img rbd_qos_bps_limit 100000

After running the write‑heavy fio test, the throughput does not exceed 100 KiB/s.

Enable pool QoS IOPS limit (200 IOPS) and retest

rbd -p libvirt-pool config image set scan.img rbd_qos_bps_limit 0
rbd config pool set libvirt-pool rbd_qos_iops_limit 200

The pool‑level IOPS limit shows little effect on the individual image performance.

Enable pool QoS BPS limit (1 MiB/s) and retest

rbd config pool set libvirt-pool rbd_qos_iops_limit 0
rbd config pool set libvirt-pool rbd_qos_bps_limit 1000000

Similarly, the pool‑level BPS limit does not noticeably affect the image throughput.

Test qemu block‑device BPS limiting

Check the current QoS of the VM's block device (vdb):

virsh blkdeviotune scan vdb

Limit the device to 5 MiB/s BPS:

virsh blkdeviotune scan vdb --total-bytes-sec 5000000 --live

Test qemu block‑device IOPS limiting

virsh blkdeviotune scan vdb --total-bytes-sec 0 --live
virsh blkdeviotune scan vdb --total-iops-sec 1000 --live

Run a heavy fio workload inside the VM:

fio --name=Test --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting

Remove QoS settings and retest

fio --name=Test --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting

Test Conclusion

For Ceph RBD QoS, applying limits at the image level (both IOPS and BPS) yields clear throttling effects, whereas pool‑level limits have little impact. Using qemu's blkdeviotune to limit VM block devices also works and can be applied to local disks, offering a flexible alternative to Ceph‑only QoS.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxCephQoSRBD
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.