Mastering Production KVM Virtualization: CPU, Memory, Network & Storage Best Practices
This article shares practical production‑level KVM virtualization techniques, covering CPU binding and host‑passthrough, memory management, network optimization with Open vSwitch, storage choices, VM time drift handling, and resource limiting via CGroup, offering actionable insights for reliable, high‑performance virtualized environments.
This article, compiled from the "Efficient Operations" WeChat series, presents hands‑on experience from a long‑term virtualization project, focusing on migrating existing services to a virtualized environment.
CPU Technical Points
The author highlights CPU binding as a powerful technique that can be applied online to resolve performance bottlenecks. In a case where a game server experienced spikes during a weekend event, the host’s front‑end CPUs were overloaded while rear CPUs were idle; an online CPU binding was performed to balance utilization and eliminate lag.
CPU host‑passthrough
CPU host‑passthrough passes all physical CPU features to the virtual CPU, improving performance for workloads that can leverage those features. It also allows the guest to see the same CPU brand and model as the host, which benefits public‑cloud users. However, this technique does not support live migration between hosts with different CPU models.
Memory Technical Points
When virtualizing, it is generally recommended to disable memory compression (KSM) because KSM continuously scans memory, consuming CPU cycles, and can cause excessive swapping under high load, severely degrading VM performance.
Network Technical Points
Network challenges focus on manageability and performance. Manageability relies on Open vSwitch, a pure‑software switch that can communicate with physical switches at the protocol level.
Performance can be improved via hardware (10 GbE NICs, SR‑IOV) or software (VIRTIO, NIC exclusive) solutions.
VM Time Drift
All VMs—whether KVM, VMware, Xen, or Hyper‑V—experience clock drift because their clocks are emulated and typically run faster than physical hardware. Modern operating systems mitigate this, but it is still advisable to configure precise clocks and NTP in production to ensure accurate timekeeping, especially for latency‑sensitive services.
Disk
For virtual machine disk images, qcow2 or LVM are recommended due to their support for dynamic expansion, snapshots, and thin provisioning, which simplify management.
The VirtIO driver is the standard for disk I/O, offering near‑native performance by bypassing the user‑space virtualization layer.
Common cache modes include writeback, writethrough, none, and unsafe. On CentOS, writeback is the default, leveraging the host filesystem cache for better performance.
In production, the author prefers writethrough for standalone virtualization to prioritize data safety, and none for clustered environments where live migration is required.
Virtualization Storage Methods
Standalone virtualization
This model runs multiple VMs on a single host, with compute, storage, and networking all residing on that host. It requires no changes to the existing environment and can be deployed quickly.
Virtualization cluster
This approach combines commercial storage with multiple compute nodes. VM images reside on shared storage, allowing live migration, high availability, and dynamic resource balancing.
Choosing Commercial Storage
Common storage types include file and block storage; block storage can be iSCSI or Fibre Channel (FC). Production environments should use dual‑controller, fully redundant storage to avoid single points of failure.
While FC offers the highest performance at a higher cost, the author prefers iSCSI for its cost‑effectiveness and sufficient performance.
Business performance requirements
Budget
Familiarity with the technology
Distributed File System
This variant replaces commercial storage with a cluster of ordinary servers, enabling massive scale and dynamic expansion—an architecture commonly used by public clouds.
Typical application scenarios:
Standalone virtualization: high load, low VM density, few VMs per rack.
Cluster virtualization: moderate load, VM density >1:7, many VMs, fast deployment, high availability.
Distributed file system virtualization: overall disk I/O < 1000 IOPS, often combined with commercial clustered storage.
SSD usage in virtualized storage is increasing, and software‑defined storage that combines SSDs with software control is becoming popular.
VM Resource Limits
In production, it is essential to limit VM resources to prevent a single VM from starving others. CGroup provides flexible and granular resource control, though its configuration can be complex.
Libvirt adds a layer on top of CGroup; by editing the VM’s XML definition, resource limits can be applied. Detailed instructions are available on the author’s blog.
http://xiaoli110.blog.51cto.com/1724/1070201Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
