Operations 14 min read

How to Test and Tune Ceph RBD QoS: A Step‑by‑Step Guide

This article explains Ceph RBD QoS concepts, describes a full testing environment and workflow, shows command‑line configurations and fio benchmarks, and summarizes the findings that image‑level QoS limits are effective while pool‑level limits are not.

Ops Development Stories
Ops Development Stories
Ops Development Stories
How to Test and Tune Ceph RBD QoS: A Step‑by‑Step Guide

About Ceph QoS

Ceph is a highly scalable distributed storage system that has become a cornerstone of cloud computing and big‑data infrastructures. As clusters grow and workloads diversify, ensuring effective resource allocation and performance isolation becomes critical, making Ceph's Quality of Service (QoS) features especially important.

Test Environment

Operating System: Ubuntu 20.04

Kernel: 5.4.0-163-generic

CPU / Memory: 32 Cores / 128 GB

Disk: 10 TB

Ceph version: 17.2.5 Quincy (stable)

Test Process

Launch a VM that uses a Ceph RBD image

Run fio without any speed limits to obtain baseline IOPS and BPS

Enable image‑level QoS IOPS limit and test the image

Enable image‑level QoS BPS limit and test the image

Enable pool‑level QoS IOPS limit and test the pool

Enable pool‑level QoS BPS limit and test the pool

Test qemu block‑device BPS limiting

Test qemu block‑device IOPS limiting

Remove QoS settings and retest to verify restoration

Test Steps

RBD QoS has been supported since Ceph 14. Detailed configuration parameters are documented at Ceph RBD config reference . QoS is enforced on the librbd layer.

View current image QoS configuration:

<code>rbd -p libvirt-pool config image ls scan.img|grep qos</code>

View pool QoS configuration (applies to the total of all images in the pool):

<code>rbd config pool ls libvirt-pool|grep qos</code>

When no limits are set, both BPS and IOPS default to 0 (unlimited). The parameter

rbd_qos_schedule_tick_min=50

defines a 50 ms minimum scheduling interval.

rbd_qos_write_iops_burst_seconds=1

allows a one‑second burst above the IOPS limit, and the same logic applies to BPS.

IOPS : total I/O operations per second

read IOPS : read operations per second

write IOPS : write operations per second

BPS : total bytes transferred per second

read BPS : bytes read per second

write BPS : bytes written per second

Using fio for read/write testing

fio is a flexible I/O performance testing tool widely used to evaluate disk and filesystem performance. Below is a basic command that performs a 60‑second random read/write test with four parallel jobs.

<code>fio --name=randrw_test --ioengine=libaio --iodepth=1 --rw=randrw --rwmixread=50 --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting</code>

Parameter explanation

--name=randrw_test: test name

--ioengine=libaio: use Linux asynchronous I/O

--iodepth=1: one I/O in flight per job

--rw=randrw: random read/write workload

--rwmixread=50: 50 % reads, 50 % writes

--bs=4k: block size of 4 KB

--direct=1: bypass page cache

--size=1G: size of the test file per job

--numjobs=4: four concurrent jobs

--runtime=60: run for 60 seconds

--group_reporting: aggregate results across jobs

Baseline test (no QoS limits) shows the raw IOPS and BPS of the RBD image.

<code>rbd perf image iostat --pool libvirt-pool</code>

Enable image QoS IOPS limit (100 IOPS) and retest

<code>rbd -p libvirt-pool config image set scan.img rbd_qos_iops_limit 100</code>

Run the same fio command on the VM and then check the statistics:

<code>rbd perf image iostat --pool libvirt-pool</code>

The IOPS are capped at 100 as expected.

Enable image QoS BPS limit (100 KiB/s) and retest

<code>rbd -p libvirt-pool config image set scan.img rbd_qos_iops_limit 0
rbd -p libvirt-pool config image set scan.img rbd_qos_bps_limit 100000</code>

After running the write‑heavy fio test, the throughput does not exceed 100 KiB/s.

Enable pool QoS IOPS limit (200 IOPS) and retest

<code>rbd -p libvirt-pool config image set scan.img rbd_qos_bps_limit 0
rbd config pool set libvirt-pool rbd_qos_iops_limit 200</code>

The pool‑level IOPS limit shows little effect on the individual image performance.

Enable pool QoS BPS limit (1 MiB/s) and retest

<code>rbd config pool set libvirt-pool rbd_qos_iops_limit 0
rbd config pool set libvirt-pool rbd_qos_bps_limit 1000000</code>

Similarly, the pool‑level BPS limit does not noticeably affect the image throughput.

Test qemu block‑device BPS limiting

Check the current QoS of the VM's block device (vdb):

<code>virsh blkdeviotune scan vdb</code>

Limit the device to 5 MiB/s BPS:

<code>virsh blkdeviotune scan vdb --total-bytes-sec 5000000 --live</code>

Test qemu block‑device IOPS limiting

<code>virsh blkdeviotune scan vdb --total-bytes-sec 0 --live
virsh blkdeviotune scan vdb --total-iops-sec 1000 --live</code>

Run a heavy fio workload inside the VM:

<code>fio --name=Test --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting</code>

Remove QoS settings and retest

<code>fio --name=Test --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting</code>

Test Conclusion

For Ceph RBD QoS, applying limits at the image level (both IOPS and BPS) yields clear throttling effects, whereas pool‑level limits have little impact. Using qemu's blkdeviotune to limit VM block devices also works and can be applied to local disks, offering a flexible alternative to Ceph‑only QoS.

performance testingLinuxstorageCephQoSRBD
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.