Boost OpenStack Storage Efficiency with Ceph RBD Erasure Coding
This article explains how to integrate Ceph's erasure‑coded RBD pools with OpenStack, covering the design principles, storage pool layout, performance trade‑offs, and step‑by‑step configuration for Nova and Cinder to achieve higher storage utilization while maintaining high availability.
Ceph RBD Overview
Ceph provides block storage via RBD, accessible through the kernel module (krbd) which creates a virtual block device (e.g., /dev/rbd0) and the user‑space librbd library used by QEMU/KVM. Source code of interest includes drivers/block/rbd.c in the Linux kernel and the src/librbd tree in the Ceph GitHub repository.
Replication vs. Erasure Coding
Default replication stores identical copies (e.g., three copies give 33% usable space). Erasure coding (EC) splits data into k data chunks and m parity chunks. A 4+2 scheme stores six chunks across six OSDs, yielding ~66% utilization and tolerating two failures. Larger schemes such as 10+4 increase utilization (>70%) but add CPU overhead for encode/decode.
RBD + EC Architecture
Production deployments separate metadata and data: metadata resides in a replicated pool, while block data resides in an EC pool. Writes first update metadata in the replica pool, then encode data and store fragments in the EC pool. Reads retrieve the mapping from the replica pool and decode fragments from the EC pool. Cloning is efficient because only metadata references change; data blocks are shared until a write triggers copy‑on‑write within the EC pool.
RBD EC Command Reference
1. Initialize pool: rbd pool init <poolname>
2. Create image: rbd create --size <size> --image-feature layering <poolname>/<name> --data-pool <datapool>
3. Resize image: rbd resize --size <size> <poolname>/<image>
4. Rename image: rbd mv <poolname>/<old> <poolname>/<new>
5. Flatten clone: rbd flatten <poolname>/<clone>
6. Delete image header: rados -p <poolname> rm rbd_header.<id>
7. Create snapshot: rbd snap create <poolname>/<image>@<snap>
8. Protect snapshot: rbd snap protect <poolname>/<image>@<snap>
9. Clone snapshot: rbd clone <poolname>/<image>@<snap> <destpool>/<clone> --data-pool <datapool>
10. Rollback snapshot: rbd snap rollback <poolname>/<image>@<snap>
11. Delete snapshot: rbd snap unprotect <poolname>/<image>@<snap>; rbd snap rm <poolname>/<image>@<snap>Adapting OpenStack for Ceph EC RBD
Nova configuration (nova‑compute): set rbd_user, rbd_secret_uuid, erasure_rbd_meta_pool, and erasure_rbd_data_pool. Modify nova/virt/libvirt/storage/rbd_utils.py to detect EC‑enabled images and add the RBD_FEATURE_DATA_POOL flag when cloning.
Cinder configuration (cinder‑volume): set rbd_type = erasure and erasure_data_pool = <data_pool>. Update cinder/volume/drivers/rbd.py to use the same EC features for clone operations.
Conclusion
The hybrid design—metadata in a replicated pool and block data in an EC pool—provides low‑latency snapshot/clone operations while reducing storage consumption by up to 50% compared with triple replication. This approach is suitable for a range of workloads, including databases, general compute, and cold backup, offering a cost‑effective block storage solution on top of Ceph for OpenStack deployments.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
