Operations 11 min read

Why We Switched to Nginx for L4 Load Balancing: A Practical Migration Guide

This article details a company's migration from commercial load balancers to an open‑source Nginx‑based Layer‑4 solution, covering project background, technical selection, architecture design, network and Nginx configurations, operational scripts, health‑check automation, performance testing, and data analysis using Elasticsearch and Grafana.

21CTO
21CTO
21CTO
Why We Switched to Nginx for L4 Load Balancing: A Practical Migration Guide

Project Background

Company has been using commercial load balancers (LB) and wants to replace them with open‑source solutions for several reasons:

High price and low HTTPS concurrency.

High technical threshold and learning cost.

Slow bug fixes.

Commercial products lag behind in new features such as H2 and protocol_proxy support.

Technical Selection

We evaluated LVS, HAProxy and Nginx for Layer‑4 load‑balancing capabilities. The new LB solution should scale horizontally, so we focused on functional support. The assessment concluded that, because the main traffic is web‑based and the ops team is familiar with Nginx, we will initially implement Nginx for Layer‑4 load balancing.

Solution Design

The architecture places Layer‑4 load balancing at the front end and Layer‑7 load balancing behind it. It also supports multi‑datacenter disaster recovery. Key redundancy components:

Datacenter A and B can operate in active‑standby or active‑active mode.

If OSPF fails, Nginx can act as a backup for Layer‑4 access, pointing directly to the L7 server pool.

Both L7 server pools in the two datacenters provide services simultaneously, avoiding single points of failure.

L7 server pool configurations are kept synchronized via static compiled installations.

Related Configuration

Network Configuration

OS: CentOS 7 (test environment). Install routing software and configure Zebra and OSPFd.

# Install routing software
yum install quagga

# Zebra configuration ( /etc/quagga/zebra.conf )
hostname test-ssl-10-231.test.org
password 8 WuN0UOEsh./0U
enable password 8 g9UPXyneQv2n.
log file /var/log/quagga/zebra.log
service password-encryption

# OSPFd configuration ( /etc/quagga/ospfd.conf )
hostname test-ssl-10-231.test.org
password 8 cQGHF4e9QbcA
enable password 8 RBUKMtvgMhU3M
log file /var/log/quagga/ospfd.log
service password-encryption

interface eth2
 ip ospf authentication message-digest
 ip ospf message-digest-key 1 md5 pIW87ypU3d4v3pG7
 ip ospf hello-interval 1
 ip ospf dead-interval 4
 ip ospf priority 0

router ospf
 ospf router-id 10.10.41.130
 log-adjacency-changes
 network 10.10.41.0/24 area 0.0.0.0
 network 10.10.100.100/32 area 0.0.0.0
 area 0.0.0.0 authentication message-digest

# Enable and start services
systemctl enable zebra.service
systemctl enable ospfd.service
systemctl start zebra.service
systemctl start ospfd.service

# Policy routing (example)
echo "WATCH_DAEMONS=\"zebra ospfd\"" >> /etc/sysconfig/quagga
ip route add 10.10.41.0/24 dev eth1 src 10.10.41.130 table wan41
ip route add default via 10.10.41.250 table wan41
ip rule add from 10.10.41.130 table wan41

Switch Configuration

(omitted)

Enable Zebra/OSPFD keep‑alive

Add the following line to /etc/sysconfig/quagga:

WATCH_DAEMONS="zebra ospfd"

Nginx Layer‑7 Configuration (Client IP logging)

listen 80 proxy_protocol;
listen 443 http2 proxy_protocol;
log_format  xff  '$proxy_protocol_addr:$proxy_protocol_port $http_x_forwarded_for - $remote_user [$time_local] "$request" '
                 '$status $body_bytes_sent "$http_referer" "$http_user_agent" "$host" '
                 '$request_time "$upstream_addr" "$upstream_response_time" "$server_protocol"';

Nginx TCP (Layer‑4) Proxy Configuration

stream {
    log_format proxy '$remote_addr:$remote_port [$time_local] $protocol $status $bytes_sent $bytes_received $session_time "$upstream_addr" "$upstream_bytes_sent" "$upstream_bytes_received" "$upstream_connect_time"';
    upstream backend-test { server 10.x.x.233:80; }
    upstream backend-test_ssl { server 10.x.x.233:443; }
    server {
        listen 80;
        proxy_protocol on;
        proxy_pass backend-test;
        access_log /opt/test/logs/nginx/m.test.com.log proxy;
    }
    server {
        listen 443;
        proxy_protocol on;
        proxy_pass backend-test_ssl;
        access_log /opt/test/logs/nginx/m.test.com.log proxy buffer=1k flush=1s;
    }
}

Systemd Service for Nginx

[Unit]
Description=nginx
After=network.target

[Service]
Type=forking
ExecStart=/opt/test/nginx/sbin/nginx
ExecReload=/opt/test/nginx/sbin/nginx -s reload
ExecStop=/opt/test/nginx/sbin/nginx -s stop
PrivateTmp=true

[Install]
WantedBy=multi-user.target

# Enable at boot
systemctl enable nginx.service

Operations Management

Add IP Script

#!/bin/bash
ip=$1
pswd="test123"
expect -c "
set timeout 30
spawn -noecho telnet 127.0.0.1 2604
expect \"Password:\"
send \"$pswd\r\"
expect \" *>\"
send \"enable\r\"
expect \"Password:\"
send \"$pswd\r\"
expect \" *#\"
send \"configure t\r\"
expect \" *(config)#\"
send \"router ospf\r\"
expect \" *(config-router)#\"
send \"network $ip/32 area 0.0.0.0\r\"
expect \" *(config-router)#\"
send \"w\r\"
send \"exit\r\"
send \"exit\r\"
send \"exit\r\"
" > /dev/null
# Add policy routing
ip addr add 10.10.100.103/32 dev lo:1
ip rule add from 10.10.100.103 table wan41
# Persist to config file
# rule-lo:1
from 10.10.100.103 table wan41

Health Check and Alerting

# Monit configuration example
set mailserver mail.test.com port 25
set mail-format {
    from:[email protected]
    subject:Nginx-L4 $SERVICE $EVENT at $DATE
    message:Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}
set alert [email protected]
check process nginx with pidfile /opt/test/nginx/logs/nginx.pid
    if does not exist for 3 cycles then exec "/bin/systemctl stop zebra"
    else if succeeded for 3 cycles then exec "/bin/sh /opt/test/sysadmin/start.sh"
check host Nginx-L4 with address 10.x.x.250
    if failed ping count 5 then exec "/bin/systemctl stop zebra"
    else if succeeded then exec "/bin/sh /opt/test/sysadmin/start.sh"

Performance Test Data

The main test measured Layer‑7 SSL RSA‑2048 encryption/decryption capacity. After adding an acceleration card to a 2620 CPU, the concurrent TPS reached 26,000.

Data Analysis

Bandwidth, traffic, and PV data are collected via the Elasticsearch API, stored back into ES, and visualized with Grafana.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringsystemdL4OSPF
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.