How to Seamlessly Hot-Migrate VM Disks to a Mixed-Flash Ceph Cluster with QMP
This article details the design, implementation, and results of migrating over 16,000 virtual machine disks from an aging Ceph Luminous cluster to a high‑performance mixed‑flash Ceph cluster using QEMU QMP hot‑swap, achieving a 47.65% reduction in IO latency and enabling cost‑saving decommissioning of legacy storage.
1. Background
360 Cloud Platform has long used an OpenStack‑based IaaS built on a customized OpenStack stack with Ceph Luminous as the storage backend. The pure SATA Ceph Luminous cluster could no longer meet the IO demands of data‑intensive services, and its maintainability and observability were limited.
2. Objectives
Migrate more than 16,000 existing VM disks to a mixed‑flash Ceph cluster.
Achieve an IO latency target of approximately 464 µs.
Decommission old Ceph clusters after migration to reduce storage costs.
3. Solution Design
Key considerations included choosing between cold and hot migration, acceptable downtime, IO impact on running VMs, and data‑consistency guarantees.
Two options were evaluated:
Option 1: Export‑import synchronization between the old and new Ceph clusters, requiring VM shutdown during the final cut‑over.
Option 2: Develop a hot‑swap feature in the virtualization layer that uses QEMU QMP double‑write, allowing migration without VM downtime.
Option 2 was selected because it provides seamless hot migration with no user‑visible impact.
4. Implementation Details
Custom Nova components were created. Version 1 used libvirt’s block‑copy interface but required a VM reboot to switch to the remote RBD path. Version 2 switched to libvirt’s QMP interface for storage hot‑copy, solving the remote‑RBD limitation.
5. Code Development
Nova‑API side implements request reception, parameter validation (volume type and VM UUID), and action recording.
Nova‑compute side performs the following steps:
Lock the VM to prevent concurrent operations.
Verify that the volume is of the target type and is a cloud disk.
Set the old volume to a detaching state.
Create a new volume with the specified source_volid and size.
Attach the new volume and set it to the attaching state.
Invoke libvirt’s QMP API to start the copy, enabling IO double‑write.
Mark the new volume as attached and the old one as available.
Update the Nova BDM database with the new volume information and record the action.
Handle various exception scenarios and roll back disk data if necessary.
6. QMP Mechanism
QMP (QEMU Machine Protocol) is a JSON‑based protocol that allows querying and configuring a QEMU instance. Disk copy consists of a copy phase (dirty‑page transfer) and a mirroring phase (continuous double‑write). The core driver_mirror code is shown below.
<code>static void coroutine_fn mirror_run(void *opaque) { ... }</code>7. Challenges and Solutions
Nova‑compute originally only supported local block devices; the issue was solved by using QMP to copy disks to an RBD target.
Implementing hot‑swap logic required integrating QMP calls into Nova‑compute using Python coroutines for asynchronous execution and progress logging.
Rollback logic was added to restore the original volume state and set the VM to an error state if migration failed.
Abnormal VMs were handled by batch scripts that repair BDM and volume status to avoid data loss.
8. Results
Testing demonstrated a 47.65% reduction in VM IO latency after migration to the new NVMe mixed‑flash Ceph cluster, greatly improving overall disk performance. The solution is packaged for operations teams, enabling migration of over 16,000 VMs and the decommissioning of 13 legacy Ceph clusters, resulting in substantial cost savings.
9. Future Work
Add hot‑swap support for local disks.
Adapt the hot‑swap feature to different QEMU versions.
Implement progress display, copy‑speed throttling, and interruptible copy functionality.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.