Operations 22 min read

How to Supercharge Nginx for Millions of QPS: A Complete Guide

Discover proven strategies to optimize Nginx under extreme traffic, covering benchmark testing, kernel tuning, configuration tweaks, caching, load balancing, SSL hardening, monitoring, and real-world case studies that demonstrate how to achieve stable high‑QPS performance while minimizing latency and resource usage.

MaGe Linux Operations

Aug 29, 2025

How to Supercharge Nginx for Millions of QPS: A Complete Guide

Nginx High‑Concurrency Performance Tuning and Architecture Design: A Complete Guide

Introduction: Why You Need Nginx Performance Tuning?

During a Double‑11 promotion, an e‑commerce platform saw 8 million users within three minutes and QPS spike to 500 k, while the optimized Nginx cluster kept CPU usage below 60%.

This is the power of Nginx performance tuning.

If you are facing any of the following problems:

Frequent 502/504 errors when traffic surges

Nginx CPU usage stays high but QPS does not increase

Unclear how to design a highly available Nginx architecture

Want to squeeze maximum server performance without a clear starting point

This article will help you solve these issues by sharing five years of large‑scale production experience, pitfalls, and exclusive optimization tricks.

1. Performance Benchmarking: Know Your Baseline

Before optimizing, understand the current performance baseline. Many jump straight to parameter tweaking, which is a typical mistake.

1.1 Benchmark Tool Selection and Usage

# Use wrk for benchmarking
wrk -t12 -c400 -d30s --latency http://your-domain.com/

# Use ab for simple testing
ab -n 100000 -c 1000 http://your-domain.com/

# Use vegeta for more precise testing
echo "GET http://your-domain.com/" | vegeta attack -duration=30s -rate=10000 | vegeta report

Practical Tips: Monitor the following key metrics during benchmarking:

QPS/TPS

Response time distribution (P50, P95, P99)

Error rate

CPU/memory/network/disk I/O usage

1.2 Performance Bottleneck Identification

Based on experience, Nginx bottlenecks usually appear in these areas:

Connection limits: default file descriptor limits

CPU bottleneck: improper worker process configuration

Memory bottleneck: unreasonable buffer settings

Network I/O bottleneck: uneven NIC interrupt handling

Disk I/O bottleneck: log writing slows overall performance

2. System‑Level Optimizations: Laying the Foundation

2.1 Kernel Parameter Tuning

The following parameters can be copied directly into the production environment:

# /etc/sysctl.conf

# Maximum number of file handles
fs.file-max = 2000000
fs.nr_open = 2000000

# Network optimizations
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3

# TCP buffer optimizations
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 262144 16777216
net.ipv4.tcp_wmem = 4096 262144 16777216

# Connection tracking table
net.netfilter.nf_conntrack_max = 1000000
net.nf_conntrack_max = 1000000
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

# BBR congestion control (kernel 4.9+)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Key Points: tcp_tw_reuse: Allows TIME_WAIT socket reuse, effective for short‑lived high‑concurrency connections. somaxconn: Determines Nginx backlog limit; must be increased.

BBR algorithm: Google’s congestion control that improves performance on high‑latency networks.

2.2 File Descriptor Limits

# /etc/security/limits.conf
* soft nofile 1000000
* hard nofile 1000000
* soft nproc 1000000
* hard nproc 1000000

# For systemd‑managed services
# /etc/systemd/system/nginx.service.d/override.conf
[Service]
LimitNOFILE=1000000
LimitNPROC=1000000

3. Nginx Configuration Optimizations: Core Tuning

3.1 Global Configuration

# nginx.conf

user nginx;
worker_processes auto;
worker_rlimit_nofile 1000000;

worker_cpu_affinity auto;

error_log /var/log/nginx/error.log error;

events {
    use epoll;
    worker_connections 65535;
    multi_accept on;
    accept_mutex off;
}

http {
    # Basic optimizations
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;

    # Connection timeout
    keepalive_timeout 65;
    keepalive_requests 10000;
    reset_timedout_connection on;
    client_body_timeout 10;
    client_header_timeout 10;
    send_timeout 10;

    # Buffer settings
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;
    output_buffers 32 128k;
    postpone_output 1460;

    # File cache
    open_file_cache max=200000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;

    # Gzip compression
    gzip on;
    gzip_min_length 1k;
    gzip_buffers 16 64k;
    gzip_http_version 1.1;
    gzip_comp_level 6;
    gzip_types text/plain application/javascript application/x-javascript text/css application/xml text/javascript;
    gzip_vary on;
    gzip_proxied any;
    gzip_disable "MSIE [1-6]\\.";

    # Hide version
    server_tokens off;

    # Server name hash
    server_names_hash_bucket_size 128;
    server_names_hash_max_size 512;

    # Access log
    access_log /var/log/nginx/access.log main buffer=32k flush=5s;
}

3.2 Upstream Server Configuration

upstream backend {
    least_conn;
    keepalive 300;
    keepalive_requests 10000;
    keepalive_timeout 60s;

    server backend1.example.com:8080 max_fails=2 fail_timeout=10s weight=5;
    server backend2.example.com:8080 max_fails=2 fail_timeout=10s weight=5;
    server backend3.example.com:8080 max_fails=2 fail_timeout=10s weight=5 backup;

    # Health check (requires nginx_upstream_check_module)
    check interval=3000 rise=2 fall=3 timeout=1000 type=http;
    check_http_send "HEAD /health HTTP/1.0\\r\
\\r\
";
    check_http_expect_alive http_2xx http_3xx;
}

server {
    listen 80 default_server reuseport;
    listen [::]:80 default_server reuseport;
    server_name _;

    location / {
        proxy_pass http://backend;

        # Proxy optimizations
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_connect_timeout 10s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;

        proxy_buffering on;
        proxy_buffer_size 4k;
        proxy_buffers 32 4k;
        proxy_busy_buffers_size 64k;
        proxy_temp_file_write_size 64k;

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_cache_bypass $http_upgrade;
        proxy_no_cache $http_upgrade;
    }

    location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
        expires 30d;
        add_header Cache-Control "public, immutable";
        sendfile on;
        tcp_nopush on;
        access_log off;
        valid_referers none blocked server_names ~\.google\. ~\.baidu\. ~\.bing\.;
        if ($invalid_referer) { return 403; }
    }
}

3.3 Static Resource Optimization

location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
    expires 30d;
    add_header Cache-Control "public, immutable";
    sendfile on;
    tcp_nopush on;
    access_log off;
    valid_referers none blocked server_names ~\.google\. ~\.baidu\. ~\.bing\.;
    if ($invalid_referer) { return 403; }
}

4. Advanced Optimization Techniques

4.1 Cache Strategy Optimization

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:100m max_size=10g inactive=60m use_temp_path=off;

server {
    location /api/ {
        proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
        proxy_cache my_cache;

        proxy_cache_valid 200 302 10m;
        proxy_cache_valid 404 1m;
        proxy_cache_valid any 1m;

        proxy_cache_lock on;
        proxy_cache_lock_timeout 5s;

        proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
        add_header X-Cache-Status $upstream_cache_status;

        proxy_cache_background_update on;
        proxy_cache_revalidate on;
    }
}

4.2 Rate Limiting

limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;
limit_req_zone $server_name zone=perserver:10m rate=1000r/s;
limit_conn_zone $binary_remote_addr zone=connperip:10m;

server {
    limit_req zone=perip burst=20 delay=10;
    limit_conn connperip 10;

    geo $limit_whitelist {
        default 0;
        10.0.0.0/8 1;
        192.168.0.0/16 1;
    }

    map $limit_whitelist $limit_req_key {
        0 $binary_remote_addr;
        1 "";
    }
}

4.3 SSL/TLS Optimization

server {
    listen 443 ssl http2 reuseport;
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:HIGH:!aNULL:!MD5:!RC4:!DHE;
    ssl_prefer_server_ciphers on;

    ssl_session_cache shared:SSL:50m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;

    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate /path/to/chain.pem;

    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
}

5. High‑Availability Architecture Design

5.1 Master‑Slave Architecture

# keepalived configuration example
vrrp_script check_nginx {
    script "/usr/local/bin/check_nginx.sh";
    interval 2;
    weight -5;
    fall 3;
    rise 2;
}

vrrp_instance VI_1 {
    state MASTER;
    interface eth0;
    virtual_router_id 51;
    priority 100;
    advert_int 1;
    authentication {
        auth_type PASS;
        auth_pass 1234;
    }
    virtual_ipaddress {
        192.168.1.100;
    }
    track_script {
        check_nginx;
    }
}

5.2 Load‑Balancing Architecture

In ultra‑high‑concurrency scenarios, a four‑layer plus seven‑layer load‑balancing architecture is often used:

Internet
    ↓
LVS/F5 (Layer‑4 load balancer)
    ↓
Nginx cluster (Layer‑7 load balancer)
    ↓
Application server cluster

Advantages:

LVS can handle tens of millions of QPS.

Nginx provides flexible Layer‑7 load balancing and caching.

Dual‑layer load balancing improves high availability.

5.3 Static‑Dynamic Separation Architecture

# CDN origin configuration
location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
    add_header Cache-Control "public, max-age=31536000";
    # Origin authentication
    set $auth_token "";
    if ($http_x_cdn_auth = "your-secret-token") {
        set $auth_token "valid";
    }
    if ($auth_token != "valid") { return 403; }
}

location /api/ {
    proxy_pass http://backend;
    add_header Cache-Control "no-cache, no-store, must-revalidate";
}

6. Monitoring and Troubleshooting

6.1 Performance Monitoring

location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
}

location /status {
    vhost_traffic_status_display;
    vhost_traffic_status_display_format html;
    allow 127.0.0.1;
    deny all;
}

6.2 Log Analysis

# Find most active IPs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

# Analyze response times
awk '{print $NF}' access.log | sort -n | awk '
{
    count[NR] = $1;
    sum += $1
}
END {
    print "Average:", sum/NR;
    print "P50:", count[int(NR*0.5)];
    print "P95:", count[int(NR*0.95)];
    print "P99:", count[int(NR*0.99)];
}'
# Real‑time error monitoring
tail -f error.log | grep -E "error|alert|crit"

6.3 Performance Analysis Tools

# Nginx Amplify
curl -L -O https://github.com/nginxinc/nginx-amplify-agent/raw/master/packages/install.sh
sh ./install.sh

# ngxtop
ngxtop -l /var/log/nginx/access.log

# GoAccess
goaccess /var/log/nginx/access.log -o report.html --log-format=COMBINED

7. Real‑World Case Studies

Case 1: E‑commerce Flash Sale Handling Millions of QPS

Background: Expected QPS peak of 1 million during Double‑11.

Deploy 20 Nginx servers, each with 32 CPU cores and 64 GB RAM.

Use LVS for Layer‑4 load balancing.

Push all static assets to CDN.

Cache hot data with Redis.

Configure rate limiting to block abusive traffic.

Results:

Actual peak QPS: 1.2 million

Average latency: 50 ms

P99 latency: 200 ms

Error rate: 0.01 %

Case 2: API Gateway Performance Optimization

Background: API gateway became a bottleneck in a micro‑service architecture.

# Dynamic routing with Lua
location /api {
    set $backend '';
    rewrite_by_lua_block {
        local routes = {
            ["/api/user"] = "http://user-service",
            ["/api/order"] = "http://order-service",
            ["/api/product"] = "http://product-service"
        }
        for pattern, backend in pairs(routes) do
            if ngx.re.match(ngx.var.uri, pattern) then
                ngx.var.backend = backend
                break
            end
        end
    }
    proxy_pass $backend;
}

Impact:

QPS increased by 300 %.

Latency reduced by 60 %.

CPU usage dropped 40 %.

8. Common Issues and Solutions

8.1 502 Bad Gateway

Typical Causes:

Backend server crash.

Connection timeout too short.

Insufficient buffer size.

Fixes:

# Increase timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;

# Increase buffers
proxy_buffer_size 64k;
proxy_buffers 32 32k;
proxy_busy_buffers_size 128k;

8.2 504 Gateway Timeout

# Extend timeouts
proxy_read_timeout 300s;
fastcgi_read_timeout 300s;

# Enable keepalive
upstream backend {
    server backend1.example.com:8080;
    keepalive 32;
}

8.3 High Memory Usage

Reduce number of worker processes.

Optimize buffer sizes.

Limit request body size.

Reload configuration periodically to free memory.

9. Performance Test Comparison

Metric

Before

After

Improvement

QPS

5,000

50,000

10×

P50 latency

200 ms

20 ms

90 %

P99 latency

2,000 ms

100 ms

95 %

CPU usage

90 %

40 %

55 %

Memory usage

8 GB

4 GB

50 %

Error rate

1 %

0.01 %

99 %

10. Advanced Optimization Directions

10.1 Using OpenResty

-- Rate limiting example
local limit_req = require "resty.limit.req"
local lim, err = limit_req.new("my_limit_req_store", 200, 100)
if not lim then
    ngx.log(ngx.ERR, "failed to instantiate a resty.limit.req object: ", err)
    return ngx.exit(500)
end

local key = ngx.var.binary_remote_addr
local delay, err = lim:incoming(key, true)
if not delay then
    if err == "rejected" then
        return ngx.exit(503)
    end
    ngx.log(ngx.ERR, "failed to limit req: ", err)
    return ngx.exit(500)
end

10.2 HTTP/3 QUIC Support

# Build with QUIC support
./configure --with-http_v3_module --with-http_quic_module

# HTTP/3 configuration
server {
    listen 443 http3 reuseport;
    listen 443 ssl http2;
    ssl_protocols TLSv1.3;
    add_header Alt-Svc 'h3=":443"; ma=86400';
}

Conclusion and Recommendations

By following the optimization steps above, you should be able to:

System level: Tune kernel parameters to boost processing capacity.

Nginx configuration: Fine‑tune settings to extract every bit of performance.

Architecture design: Build highly available, scalable setups.

Monitoring & operations: Establish comprehensive observability.

Troubleshooting: Quickly locate and resolve issues.

Remember, performance optimization is iterative; test each change, measure impact, and keep the configuration under version control.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Optimization high-concurrency load-balancing

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.