Boost VM Live Migration Success: Taming Dirty Pages with Memory Copy Optimizations
This article explains why dirty‑page generation outpaces memory copy during VM live migration, describes the pre‑copy algorithm, and presents three practical KVM/QEMU tuning methods—xbzrle compression, memory compression, and CPU throttling—along with command‑line examples and performance results.
Background
The low success rate of internal VM live migrations was traced to dirty‑page write speed exceeding the copy speed, preventing memory copy from finishing and causing migration failure.
Goal
Reduce dirty‑page generation speed below the network transmission speed so that memory migration completes quickly.
Memory Migration Basics
Live migration aims to transfer the entire memory state of a VM from the source host to the target host while minimizing downtime. The process follows a pre‑copy model with multiple iterations and dirty‑page tracking.
Initial Full Copy : At migration start, QEMU copies all physical memory pages to the target over the network while the VM continues running on the source.
Iterative Copy & Dirty‑Page Tracking : After the initial copy, KVM tracks pages modified (dirty pages). Each iteration transfers newly dirty pages, resets the bitmap, and repeats until the remaining dirty data is small.
Stop Phase : When dirty pages drop to a few tens‑to‑hundreds of MB, the VM is paused, remaining CPU registers, device state, and dirty pages are sent, and the VM resumes on the target.
Optimization 1: xbzrle Compression
Purpose : Compress dirty pages during the pre‑copy iterations to lower network traffic and overall migration time.
Effect : Benefits depend on cache size; best for hosts with ample memory and frequent VM memory writes.
virsh qemu-monitor-command <VM-name> --hmp 'migrate_set_capability xbzrle on'</code><code>virsh qemu-monitor-command <VM-name> --hmp 'migrate_set_parameter xbzrle-cache-size 1024m'Recommended cache size: about 1% of the VM’s memory.
Optimization 2: compress (Memory Compression)
Purpose : Compress memory data before copying to balance compression speed and ratio, improving overall migration performance.
Effect : Acceleration varies with compression ratio and CPU capability; results can be unstable.
# Set compression level (1‑9, higher = stronger, default 1)</code><code>virsh qemu-monitor-command <VM-id> --hmp 'migrate_set_parameter compress-level 6'</code><code># Set number of CPU threads for compression (typically 0.5‑0.5 of host cores)</code><code>virsh qemu-monitor-command <VM-id> --hmp 'migrate_set_parameter compress-threads 24'</code><code># Decompression threads (usually same as compression)</code><code>virsh qemu-monitor-command <VM-id> --hmp 'migrate_set_parameter decompress-threads 24'Recommended: compress‑level=9, threads≈½ of host CPU cores.
Optimization 3: cpu‑throttle
Purpose : Limit VM CPU time, inserting sleep cycles to slow dirty‑page generation, helping convergence when other tweaks fail.
Effect : High throttle percentages noticeably speed up migration but degrade VM performance.
virsh qemu-monitor-command myvm --hmp 'migrate_set_capability auto-converge on'</code><code>virsh qemu-monitor-command myvm --hmp 'migrate_set_parameter cpu-throttle-initial 30'</code><code>virsh qemu-monitor-command myvm --hmp 'migrate_set_parameter cpu-throttle-increment 5'Enable this only if migration still fails after other optimizations; set initial throttle low and increase when migration time exceeds ten times the VM memory size.
Summary
Applying the three parameters reduced dirty‑page generation from 400 MB/s (previously impossible) to a 90‑second migration, and 800 MB/s to a 180‑second migration.
Outlook
Future work includes multithreaded copy (multifd) and addressing other issues such as CPU instruction‑set mismatches.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
