Cut Migration Time by 60%: Baidu Cloud Deploys Intel Xeon 6 QAT‑Accelerated Live VM Migration
The article analyzes the challenges of large‑scale live VM migration, introduces Intel Xeon 6 CPU‑integrated QAT hardware acceleration, compares pre‑ and post‑QAT workflows, and reports a 60% reduction in migration time, 20% CPU savings, and sub‑10 ms downtime in Baidu Smart Cloud production.
Background and Challenges
Live VM migration is a core capability of cloud platforms, used for host maintenance, load balancing, and fault avoidance. As VM memory size and dirty‑page rate increase, migration time grows and CPU, memory‑bandwidth, and network resources compete with running workloads, causing noticeable downtime.
Innovation: CPU‑Integrated QAT Acceleration
Traditional migration relies on CPU software for memory compression, consuming CPU cycles. Baidu Cloud partnered with Intel to use the QAT engine built into Intel Xeon 6 Granite Rapids (GNR) performance cores. QAT offloads compression/decompression, supports lz4, zlib, and provides parallel processing, reducing CPU load.
CPU load offload : compression/decompression moves from CPU cores to dedicated hardware.
High parallelism : multiple streams achieve higher throughput than software.
Algorithm compatibility : supports mainstream lz4, zlib.
Full‑path coverage : both compression and decompression on source and destination.
Workflow Comparison
Before QAT
During pre‑copy, the source host CPU performs dirty‑page detection and software compression (lz4/zlib). Compressed pages are sent over the network, then the destination host CPU decompresses and writes pages. CPU cycles for compression compete with guest workloads, especially for large‑memory VMs.
After QAT
In pre‑copy, the CPU only detects dirty pages; the QAT user‑mode library (qatzip) sends pages to the on‑chip QAT engine for parallel hardware compression. The smaller compressed payload travels faster. On the destination, QAT hardware decompresses and writes pages, while the CPU only handles VM pause/start control.
Production Deployment
Through joint engineering with Intel, Baidu Cloud addressed stability under high concurrency, optimized low‑efficiency migrations for large‑memory VMs, and integrated QAT acceleration into the cloud scheduler.
Results in Baidu Cloud
Migration time for a 64 GB VM dropped from 33 s to 12 s (≈60 % reduction).
Host CPU utilization during migration decreased by over 20 %.
Downtime shortened to the “ten‑millisecond” level, meeting high‑SLA requirements.
Network bandwidth usage fell due to higher compression efficiency, improving link stability.
Large‑memory workloads (caches, CDN, offline jobs) benefited most, achieving fast, non‑disruptive migration.
Conclusion
The QAT‑accelerated live migration has become a foundational capability in Baidu Smart Cloud, delivering high‑concurrency, low‑perceived migration for production workloads and paving the way for deeper CPU‑chip and cloud‑service co‑optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baidu Intelligent Cloud Tech Hub
We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
