How to Supercharge Nginx for Millions of QPS: A Complete Guide
Discover proven strategies to optimize Nginx under extreme traffic, covering benchmark testing, kernel tuning, configuration tweaks, caching, load balancing, SSL hardening, monitoring, and real-world case studies that demonstrate how to achieve stable high‑QPS performance while minimizing latency and resource usage.
Nginx High‑Concurrency Performance Tuning and Architecture Design: A Complete Guide
Introduction: Why You Need Nginx Performance Tuning?
During a Double‑11 promotion, an e‑commerce platform saw 8 million users within three minutes and QPS spike to 500 k, while the optimized Nginx cluster kept CPU usage below 60%.
This is the power of Nginx performance tuning.
If you are facing any of the following problems:
Frequent 502/504 errors when traffic surges
Nginx CPU usage stays high but QPS does not increase
Unclear how to design a highly available Nginx architecture
Want to squeeze maximum server performance without a clear starting point
This article will help you solve these issues by sharing five years of large‑scale production experience, pitfalls, and exclusive optimization tricks.
1. Performance Benchmarking: Know Your Baseline
Before optimizing, understand the current performance baseline. Many jump straight to parameter tweaking, which is a typical mistake.
1.1 Benchmark Tool Selection and Usage
# Use wrk for benchmarking
wrk -t12 -c400 -d30s --latency http://your-domain.com/
# Use ab for simple testing
ab -n 100000 -c 1000 http://your-domain.com/
# Use vegeta for more precise testing
echo "GET http://your-domain.com/" | vegeta attack -duration=30s -rate=10000 | vegeta reportPractical Tips: Monitor the following key metrics during benchmarking:
QPS/TPS
Response time distribution (P50, P95, P99)
Error rate
CPU/memory/network/disk I/O usage
1.2 Performance Bottleneck Identification
Based on experience, Nginx bottlenecks usually appear in these areas:
Connection limits: default file descriptor limits
CPU bottleneck: improper worker process configuration
Memory bottleneck: unreasonable buffer settings
Network I/O bottleneck: uneven NIC interrupt handling
Disk I/O bottleneck: log writing slows overall performance
2. System‑Level Optimizations: Laying the Foundation
2.1 Kernel Parameter Tuning
The following parameters can be copied directly into the production environment:
# /etc/sysctl.conf
# Maximum number of file handles
fs.file-max = 2000000
fs.nr_open = 2000000
# Network optimizations
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3
# TCP buffer optimizations
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 262144 16777216
net.ipv4.tcp_wmem = 4096 262144 16777216
# Connection tracking table
net.netfilter.nf_conntrack_max = 1000000
net.nf_conntrack_max = 1000000
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
# BBR congestion control (kernel 4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbrKey Points: tcp_tw_reuse: Allows TIME_WAIT socket reuse, effective for short‑lived high‑concurrency connections. somaxconn: Determines Nginx backlog limit; must be increased.
BBR algorithm: Google’s congestion control that improves performance on high‑latency networks.
2.2 File Descriptor Limits
# /etc/security/limits.conf
* soft nofile 1000000
* hard nofile 1000000
* soft nproc 1000000
* hard nproc 1000000
# For systemd‑managed services
# /etc/systemd/system/nginx.service.d/override.conf
[Service]
LimitNOFILE=1000000
LimitNPROC=10000003. Nginx Configuration Optimizations: Core Tuning
3.1 Global Configuration
# nginx.conf
user nginx;
worker_processes auto;
worker_rlimit_nofile 1000000;
worker_cpu_affinity auto;
error_log /var/log/nginx/error.log error;
events {
use epoll;
worker_connections 65535;
multi_accept on;
accept_mutex off;
}
http {
# Basic optimizations
sendfile on;
tcp_nopush on;
tcp_nodelay on;
# Connection timeout
keepalive_timeout 65;
keepalive_requests 10000;
reset_timedout_connection on;
client_body_timeout 10;
client_header_timeout 10;
send_timeout 10;
# Buffer settings
client_body_buffer_size 128k;
client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 8k;
output_buffers 32 128k;
postpone_output 1460;
# File cache
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
# Gzip compression
gzip on;
gzip_min_length 1k;
gzip_buffers 16 64k;
gzip_http_version 1.1;
gzip_comp_level 6;
gzip_types text/plain application/javascript application/x-javascript text/css application/xml text/javascript;
gzip_vary on;
gzip_proxied any;
gzip_disable "MSIE [1-6]\\.";
# Hide version
server_tokens off;
# Server name hash
server_names_hash_bucket_size 128;
server_names_hash_max_size 512;
# Access log
access_log /var/log/nginx/access.log main buffer=32k flush=5s;
}3.2 Upstream Server Configuration
upstream backend {
least_conn;
keepalive 300;
keepalive_requests 10000;
keepalive_timeout 60s;
server backend1.example.com:8080 max_fails=2 fail_timeout=10s weight=5;
server backend2.example.com:8080 max_fails=2 fail_timeout=10s weight=5;
server backend3.example.com:8080 max_fails=2 fail_timeout=10s weight=5 backup;
# Health check (requires nginx_upstream_check_module)
check interval=3000 rise=2 fall=3 timeout=1000 type=http;
check_http_send "HEAD /health HTTP/1.0\\r\
\\r\
";
check_http_expect_alive http_2xx http_3xx;
}
server {
listen 80 default_server reuseport;
listen [::]:80 default_server reuseport;
server_name _;
location / {
proxy_pass http://backend;
# Proxy optimizations
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_connect_timeout 10s;
proxy_send_timeout 10s;
proxy_read_timeout 10s;
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 32 4k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
proxy_no_cache $http_upgrade;
}
location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
expires 30d;
add_header Cache-Control "public, immutable";
sendfile on;
tcp_nopush on;
access_log off;
valid_referers none blocked server_names ~\.google\. ~\.baidu\. ~\.bing\.;
if ($invalid_referer) { return 403; }
}
}3.3 Static Resource Optimization
location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
expires 30d;
add_header Cache-Control "public, immutable";
sendfile on;
tcp_nopush on;
access_log off;
valid_referers none blocked server_names ~\.google\. ~\.baidu\. ~\.bing\.;
if ($invalid_referer) { return 403; }
}4. Advanced Optimization Techniques
4.1 Cache Strategy Optimization
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:100m max_size=10g inactive=60m use_temp_path=off;
server {
location /api/ {
proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
proxy_cache my_cache;
proxy_cache_valid 200 302 10m;
proxy_cache_valid 404 1m;
proxy_cache_valid any 1m;
proxy_cache_lock on;
proxy_cache_lock_timeout 5s;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
add_header X-Cache-Status $upstream_cache_status;
proxy_cache_background_update on;
proxy_cache_revalidate on;
}
}4.2 Rate Limiting
limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;
limit_req_zone $server_name zone=perserver:10m rate=1000r/s;
limit_conn_zone $binary_remote_addr zone=connperip:10m;
server {
limit_req zone=perip burst=20 delay=10;
limit_conn connperip 10;
geo $limit_whitelist {
default 0;
10.0.0.0/8 1;
192.168.0.0/16 1;
}
map $limit_whitelist $limit_req_key {
0 $binary_remote_addr;
1 "";
}
}4.3 SSL/TLS Optimization
server {
listen 443 ssl http2 reuseport;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:HIGH:!aNULL:!MD5:!RC4:!DHE;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;
ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /path/to/chain.pem;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
}5. High‑Availability Architecture Design
5.1 Master‑Slave Architecture
# keepalived configuration example
vrrp_script check_nginx {
script "/usr/local/bin/check_nginx.sh";
interval 2;
weight -5;
fall 3;
rise 2;
}
vrrp_instance VI_1 {
state MASTER;
interface eth0;
virtual_router_id 51;
priority 100;
advert_int 1;
authentication {
auth_type PASS;
auth_pass 1234;
}
virtual_ipaddress {
192.168.1.100;
}
track_script {
check_nginx;
}
}5.2 Load‑Balancing Architecture
In ultra‑high‑concurrency scenarios, a four‑layer plus seven‑layer load‑balancing architecture is often used:
Internet
↓
LVS/F5 (Layer‑4 load balancer)
↓
Nginx cluster (Layer‑7 load balancer)
↓
Application server clusterAdvantages:
LVS can handle tens of millions of QPS.
Nginx provides flexible Layer‑7 load balancing and caching.
Dual‑layer load balancing improves high availability.
5.3 Static‑Dynamic Separation Architecture
# CDN origin configuration
location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
add_header Cache-Control "public, max-age=31536000";
# Origin authentication
set $auth_token "";
if ($http_x_cdn_auth = "your-secret-token") {
set $auth_token "valid";
}
if ($auth_token != "valid") { return 403; }
}
location /api/ {
proxy_pass http://backend;
add_header Cache-Control "no-cache, no-store, must-revalidate";
}6. Monitoring and Troubleshooting
6.1 Performance Monitoring
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
location /status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
allow 127.0.0.1;
deny all;
}6.2 Log Analysis
# Find most active IPs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
# Analyze response times
awk '{print $NF}' access.log | sort -n | awk '
{
count[NR] = $1;
sum += $1
}
END {
print "Average:", sum/NR;
print "P50:", count[int(NR*0.5)];
print "P95:", count[int(NR*0.95)];
print "P99:", count[int(NR*0.99)];
}'
# Real‑time error monitoring
tail -f error.log | grep -E "error|alert|crit"6.3 Performance Analysis Tools
# Nginx Amplify
curl -L -O https://github.com/nginxinc/nginx-amplify-agent/raw/master/packages/install.sh
sh ./install.sh
# ngxtop
ngxtop -l /var/log/nginx/access.log
# GoAccess
goaccess /var/log/nginx/access.log -o report.html --log-format=COMBINED7. Real‑World Case Studies
Case 1: E‑commerce Flash Sale Handling Millions of QPS
Background: Expected QPS peak of 1 million during Double‑11.
Deploy 20 Nginx servers, each with 32 CPU cores and 64 GB RAM.
Use LVS for Layer‑4 load balancing.
Push all static assets to CDN.
Cache hot data with Redis.
Configure rate limiting to block abusive traffic.
Results:
Actual peak QPS: 1.2 million
Average latency: 50 ms
P99 latency: 200 ms
Error rate: 0.01 %
Case 2: API Gateway Performance Optimization
Background: API gateway became a bottleneck in a micro‑service architecture.
# Dynamic routing with Lua
location /api {
set $backend '';
rewrite_by_lua_block {
local routes = {
["/api/user"] = "http://user-service",
["/api/order"] = "http://order-service",
["/api/product"] = "http://product-service"
}
for pattern, backend in pairs(routes) do
if ngx.re.match(ngx.var.uri, pattern) then
ngx.var.backend = backend
break
end
end
}
proxy_pass $backend;
}Impact:
QPS increased by 300 %.
Latency reduced by 60 %.
CPU usage dropped 40 %.
8. Common Issues and Solutions
8.1 502 Bad Gateway
Typical Causes:
Backend server crash.
Connection timeout too short.
Insufficient buffer size.
Fixes:
# Increase timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
# Increase buffers
proxy_buffer_size 64k;
proxy_buffers 32 32k;
proxy_busy_buffers_size 128k;8.2 504 Gateway Timeout
# Extend timeouts
proxy_read_timeout 300s;
fastcgi_read_timeout 300s;
# Enable keepalive
upstream backend {
server backend1.example.com:8080;
keepalive 32;
}8.3 High Memory Usage
Reduce number of worker processes.
Optimize buffer sizes.
Limit request body size.
Reload configuration periodically to free memory.
9. Performance Test Comparison
Metric
Before
After
Improvement
QPS
5,000
50,000
10×
P50 latency
200 ms
20 ms
90 %
P99 latency
2,000 ms
100 ms
95 %
CPU usage
90 %
40 %
55 %
Memory usage
8 GB
4 GB
50 %
Error rate
1 %
0.01 %
99 %
10. Advanced Optimization Directions
10.1 Using OpenResty
-- Rate limiting example
local limit_req = require "resty.limit.req"
local lim, err = limit_req.new("my_limit_req_store", 200, 100)
if not lim then
ngx.log(ngx.ERR, "failed to instantiate a resty.limit.req object: ", err)
return ngx.exit(500)
end
local key = ngx.var.binary_remote_addr
local delay, err = lim:incoming(key, true)
if not delay then
if err == "rejected" then
return ngx.exit(503)
end
ngx.log(ngx.ERR, "failed to limit req: ", err)
return ngx.exit(500)
end10.2 HTTP/3 QUIC Support
# Build with QUIC support
./configure --with-http_v3_module --with-http_quic_module
# HTTP/3 configuration
server {
listen 443 http3 reuseport;
listen 443 ssl http2;
ssl_protocols TLSv1.3;
add_header Alt-Svc 'h3=":443"; ma=86400';
}Conclusion and Recommendations
By following the optimization steps above, you should be able to:
System level: Tune kernel parameters to boost processing capacity.
Nginx configuration: Fine‑tune settings to extract every bit of performance.
Architecture design: Build highly available, scalable setups.
Monitoring & operations: Establish comprehensive observability.
Troubleshooting: Quickly locate and resolve issues.
Remember, performance optimization is iterative; test each change, measure impact, and keep the configuration under version control.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
