How to Maximize HAProxy Performance with CPU, NIC, and System Tuning
This guide explains how to select optimal hardware, configure CPU affinity, adjust kernel parameters for short and long connections, enable SSL offload, and use HAProxy multi‑process mode to achieve the highest possible throughput and stability.
Hardware and System Selection
HAProxy runs single‑threaded, non‑blocking, and event‑driven, so it fully utilizes one CPU core. When choosing hardware, prioritize high‑frequency CPUs with large caches over simply increasing core count.
Use NICs that support multiple queues and enable CPU‑affinity for interrupts (e.g., Intel I350AM4 with 8 RX and 8 TX queues per port). Bind HAProxy and NIC interrupts to the same physical CPU but different cores to share L3 cache while avoiding contention.
Kernel Parameter Tuning for Short Connections
Adjust the following sysctl settings to handle high connection rates: net.ipv4.ip_local_port_range = 1025 65534 (increase local port range)
net.ipv4.tcp_max_syn_backlog = 100000 net.core.netdev_max_backlog = 100000 net.ipv4.tcp_tw_reuse = 1and net.ipv4.tcp_tw_recycle = 1 (allow reuse of TIME_WAIT sockets)
net.core.somaxconn = 65534 fs.file-max = 65535(increase file descriptor limit)
Disable IRQ Balance and avoid running HAProxy in a virtual machine when connection rates exceed 5 K/s. Also, avoid using iptables conntrack as it degrades performance.
Kernel Parameter Tuning for Long Connections
For persistent connections (e.g., SSL offload), apply these settings:
net.ipv4.tcp_rmem = 10000000 10000000 10000000 net.ipv4.tcp_wmem = 10000000 10000000 10000000 net.ipv4.tcp_mem = 10000000 10000000 10000000 net.core.rmem_max = 11960320and
net.core.wmem_max = 11960320 net.ipv4.tcp_sack = 0and net.ipv4.tcp_timestamps = 0 (disable selective ACK and timestamps) net.ipv4.tcp_slow_start_after_idle = 0 (prevent CWnd reduction on idle connections)
HAProxy Multi‑Process Configuration
HAProxy can run multiple processes, though the official recommendation is to use a single process. Multi‑process benefits include dedicated cores per process, linear SSL key generation scaling, and easier horizontal scaling. Drawbacks are increased memory usage, inability to share stick‑tables, and more complex configuration.
Example configuration to run four processes and bind each to a specific CPU core:
global
nbproc 4
cpu-map 1 0 # process 1 → CPU 0
cpu-map 2 1
cpu-map 3 2
cpu-map 4 3Bind frontends to specific processes:
frontend access_http
bind 0.0.0.0:80
bind-process 1
frontend access_https
bind 0.0.0.0:443 ssl crt /etc/yourdomain.pem
bind-process 2 3 4NIC Driver Settings
For Intel 10‑GbE NICs (e.g., 82599EB), disable Large Receive Offload (LRO) to reduce latency:
# ethtool -K eth0 lro off
# ethtool -K eth1 lro offOptionally adjust PCIe settings with setpci as needed.
Additional Process Isolation
Use taskset to bind auxiliary services (e.g., backup clients, Munin, Nagios, SNMP, syslog, Zabbix) to cores separate from HAProxy to avoid CPU contention.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
