Design and Implementation of an Open‑Source Load Balancing Solution Using Nginx and LVS
The article describes how a company replaced costly commercial load balancers with an open‑source architecture based on Nginx for layer‑4 traffic and a layer‑7 cluster, detailing project background, technology selection, redundant design, network and Nginx configurations, operational scripts, performance testing, and data analysis.
Project background: the company used commercial load balancers but sought open‑source alternatives because of high price, low HTTPS concurrency, steep learning curve, slow bug fixes, and lagging feature support.
Technical selection: evaluated LVS, HAProxy, and Nginx for layer‑4 load balancing; chose Nginx due to team familiarity and the primary web traffic nature.
Solution design: a two‑layer architecture with Nginx handling layer‑4 traffic at the front and a layer‑7 load‑balancing cluster behind it; redundancy across two data centers (A and B) to avoid single points of failure, including dual‑active or primary‑backup modes.
Related configurations – network: OSPF routing using Quagga on CentOS 7, with detailed zebra and ospfd configuration, static routes, policy routing tables, and fail‑over settings.
#OS based on CentOS7, test environment, production environment should be adjusted accordingly
# Install routing software
yum install quagga
# zebra configuration
cat /etc/quagga/zebra.conf
!
! Zebra configuration saved from vty
! 2017/09/28 15:57:12
!
hostname test-ssl-10-231.test.org # each host must have a unique name
password 8 WuN0UOEsh./0U
enable password 8 g9UPXyneQv2n.
log file /var/log/quagga/zebra.log
service password-encryption
# ospfd configuration
cat /etc/quagga/ospfd.conf
hostname test-ssl-10-231.test.org # each host must have a unique name
password 8 cQGHF4e9QbcA
enable password 8 RBUKMtvgMhU3M
log file /var/log/quagga/ospfd.log
service password-encryption
!
interface eth2
ip ospf authentication message-digest
ip ospf message-digest-key 1 md5 pIW87ypU3d4v3pG7 # password for network engineers
ip ospf hello-interval 1
ip ospf dead-interval 4
ip ospf priority 0
router ospf
ospf router-id 10.10.41.130 # unique per router
log-adjacency-changes
network 10.10.41.0/24 area 0.0.0.0
network 10.10.100.100/32 area 0.0.0.0 # announce OSPF neighbor and VIP
area 0.0.0.0 authentication message-digest
!
line vty
!
# Enable services
systemctl enable zebra.service
systemctl enable ospfd.service
systemctl start zebra.service
systemctl start ospfd.service
# OSPF and zebra keep‑alive configuration
vim /etc/sysconfig/quagga
WATCH_DAEMONS="zebra ospfd"
# Policy routing configuration (example)
cat /etc/iproute2/rt_tables
100 wan41
ip route add 10.10.41.0/24 dev eth1 src 10.10.41.130 table wan41
ip route add default via 10.10.41.250 table wan41
ip rule add from 10.10.41.130 table wan41
cat route-eth1
10.10.41.0/24 dev eth2 src 10.10.41.130 table wan41
default via 10.10.41.250 table 100
cat rule-eth1
from 10.10.41.130 table wan41Switch configuration is omitted.
Additional settings: enable zebra and ospfd keep‑alive, configure Nginx layer‑7 logging to capture client IP via proxy protocol, and set up Nginx stream module for TCP layer‑4 proxying.
# Nginx layer‑7 configuration (log format)
listen 80 proxy_protocol;
listen 443 http2 proxy_protocol;
log_format xff '$proxy_protocol_addr:$proxy_protocol_port $http_x_forwarded_for - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" "$http_user_agent" "$host" '
'$request_time "$upstream_addr" "$upstream_response_time" "$server_protocol"'; # Nginx TCP layer‑4 proxy configuration (stream module)
stream {
log_format proxy '$remote_addr:$remote_port [$time_local] '
'$protocol $status $bytes_sent $bytes_received '
'$session_time "$upstream_addr" '
'"$upstream_bytes_sent" "$upstream_bytes_received" "$upstream_connect_time"';
upstream backend-test {
server 10.x.x.233:80;
}
upstream backend-test_ssl {
server 10.x.x.233:443;
}
server {
listen 80;
proxy_protocol on;
proxy_pass backend-test;
access_log /opt/test/logs/nginx/m.test.com.log proxy;
}
server {
listen 443;
proxy_protocol on;
proxy_pass backend-test_ssl;
access_log /opt/test/logs/nginx/m.test.com.log proxy buffer=1k flush=1s;
}
}Systemd unit file for Nginx to manage the service and enable autostart.
[Unit]
Description=nginx
After=network.target
[Service]
Type=forking
ExecStart=/opt/test/nginx/sbin/nginx
ExecReload=/opt/test/nginx/sbin/nginx -s reload
ExecStop=/opt/test/nginx/sbin/nginx -s stop
PrivateTmp=true
[Install]
WantedBy=multi-user.target
# Enable on boot
systemctl enable nginx.serviceOperations management scripts: a Bash script (addip.sh) to add OSPF routes automatically, and Monit configuration snippets for health‑checking Nginx and triggering fail‑over of zebra when needed.
# addip.sh – add OSPF route via telnet automation
#!/bin/bash
ip=$1
pswd="test123"
expect -c " set timeout 30
eval spawn -noecho telnet 127.0.0.1 2604
expect \"Password:\"
send \"$pswd\r\"
expect \" *>\"
send \"enable\r\"
expect \"Password:\"
send \"$pswd\r\"
expect \" *#\"
send \"configure t\r\"
expect \" *(config)#\"
send \"router ospf\r\"
expect \" *(config-router)#\"
send \"network $ip/32 area 0.0.0.0\r\"
expect \" *(config-router)#\"
send \"w\r\"
send \"exit\r\"
send \"exit\r\"
send \"exit\r\"
interact" >/dev/null
# Add policy routing
ip addr add 10.10.100.103/32 dev lo:1
ip rule add from 10.10.100.103 table wan41
# Persist to file
#rule-lo:1
from 10.10.100.103 table wan41 # Monit configuration for automatic OSPF/Nginx fail‑over
set mailserver mail.test.com port 25
set mail-format {
from:[email protected]
subject:Nginx-L4 $SERVICE $EVENT at $DATE
message:Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}
set alert [email protected]
check process nginx with pidfile /opt/test/nginx/logs/nginx.pid
if does not exist for 3 cycles then exec "/bin/systemctl stop zebra" else if succeeded for 3 cycles then exec "/bin/sh /opt/test/sysadmin/start.sh"
check host Nginx-L4 with address 10.x.x.250
if failed ping count 5 with timeout 1 seconds then exec "/bin/systemctl stop zebra" else if succeeded then exec "/bin/sh /opt/test/sysadmin/start.sh"Performance testing demonstrated that with SSL RSA‑2048 encryption and a 2620 CPU equipped with an accelerator card, the layer‑7 service can sustain up to 26 000 transactions per second.
Data analysis: bandwidth, traffic, and page‑view metrics are collected via Elasticsearch APIs, aggregated, stored back into Elasticsearch, and visualized using Grafana.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.