Cloud Computing 11 min read

Mastering Production KVM Virtualization: CPU, Memory, Network & Storage Best Practices

This article shares practical production‑level KVM virtualization techniques, covering CPU binding and host‑passthrough, memory management, network optimization with Open vSwitch, storage choices, VM time drift handling, and resource limiting via CGroup, offering actionable insights for reliable, high‑performance virtualized environments.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Production KVM Virtualization: CPU, Memory, Network & Storage Best Practices

This article, compiled from the "Efficient Operations" WeChat series, presents hands‑on experience from a long‑term virtualization project, focusing on migrating existing services to a virtualized environment.

CPU Technical Points

The author highlights CPU binding as a powerful technique that can be applied online to resolve performance bottlenecks. In a case where a game server experienced spikes during a weekend event, the host’s front‑end CPUs were overloaded while rear CPUs were idle; an online CPU binding was performed to balance utilization and eliminate lag.

CPU host‑passthrough

CPU host‑passthrough passes all physical CPU features to the virtual CPU, improving performance for workloads that can leverage those features. It also allows the guest to see the same CPU brand and model as the host, which benefits public‑cloud users. However, this technique does not support live migration between hosts with different CPU models.

Memory Technical Points

When virtualizing, it is generally recommended to disable memory compression (KSM) because KSM continuously scans memory, consuming CPU cycles, and can cause excessive swapping under high load, severely degrading VM performance.

Network Technical Points

Network challenges focus on manageability and performance. Manageability relies on Open vSwitch, a pure‑software switch that can communicate with physical switches at the protocol level.

Performance can be improved via hardware (10 GbE NICs, SR‑IOV) or software (VIRTIO, NIC exclusive) solutions.

VM Time Drift

All VMs—whether KVM, VMware, Xen, or Hyper‑V—experience clock drift because their clocks are emulated and typically run faster than physical hardware. Modern operating systems mitigate this, but it is still advisable to configure precise clocks and NTP in production to ensure accurate timekeeping, especially for latency‑sensitive services.

Disk

For virtual machine disk images, qcow2 or LVM are recommended due to their support for dynamic expansion, snapshots, and thin provisioning, which simplify management.

The VirtIO driver is the standard for disk I/O, offering near‑native performance by bypassing the user‑space virtualization layer.

Common cache modes include writeback, writethrough, none, and unsafe. On CentOS, writeback is the default, leveraging the host filesystem cache for better performance.

In production, the author prefers writethrough for standalone virtualization to prioritize data safety, and none for clustered environments where live migration is required.

Virtualization Storage Methods

Standalone virtualization

This model runs multiple VMs on a single host, with compute, storage, and networking all residing on that host. It requires no changes to the existing environment and can be deployed quickly.

Virtualization cluster

This approach combines commercial storage with multiple compute nodes. VM images reside on shared storage, allowing live migration, high availability, and dynamic resource balancing.

Choosing Commercial Storage

Common storage types include file and block storage; block storage can be iSCSI or Fibre Channel (FC). Production environments should use dual‑controller, fully redundant storage to avoid single points of failure.

While FC offers the highest performance at a higher cost, the author prefers iSCSI for its cost‑effectiveness and sufficient performance.

Business performance requirements

Budget

Familiarity with the technology

Distributed File System

This variant replaces commercial storage with a cluster of ordinary servers, enabling massive scale and dynamic expansion—an architecture commonly used by public clouds.

Typical application scenarios:

Standalone virtualization: high load, low VM density, few VMs per rack.

Cluster virtualization: moderate load, VM density >1:7, many VMs, fast deployment, high availability.

Distributed file system virtualization: overall disk I/O < 1000 IOPS, often combined with commercial clustered storage.

SSD usage in virtualized storage is increasing, and software‑defined storage that combines SSDs with software control is becoming popular.

VM Resource Limits

In production, it is essential to limit VM resources to prevent a single VM from starving others. CGroup provides flexible and granular resource control, though its configuration can be complex.

Libvirt adds a layer on top of CGroup; by editing the VM’s XML definition, resource limits can be applied. Detailed instructions are available on the author’s blog.

http://xiaoli110.blog.51cto.com/1724/1070201
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsKVM
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.