Operations 15 min read

How to Boost Nginx to Over 1 Million QPS: Real‑World High‑Concurrency Tuning

This guide walks you through a complete, production‑grade Nginx optimization roadmap—covering worker process tuning, TCP and kernel parameters, caching, compression, SSL hardening, upstream connection pooling, monitoring, and a real e‑commerce case study—to transform a 100k‑QPS server into a million‑plus QPS powerhouse.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Boost Nginx to Over 1 Million QPS: Real‑World High‑Concurrency Tuning

Nginx High‑Concurrency Optimization: From 100k to 1M QPS

As a seasoned operations engineer, I have repeatedly seen performance bottlenecks caused by sub‑optimal Nginx configurations. The following practical guide shows how to systematically tune Nginx so that a service can scale from 100,000 QPS to over one million.

Why Nginx?

In modern micro‑service architectures, Nginx remains the dominant reverse proxy and load balancer. Out‑of‑the‑box settings, however, quickly become a limiting factor under heavy traffic. Systematic tuning can improve throughput tenfold or more.

Performance Comparison Before and After Optimization

Typical results after applying the steps below:

Default configuration: 80,000 QPS, 125 ms latency, 85 % CPU, 2.1 GB RAM.

Basic tuning: 250,000 QPS, 45 ms latency, 68 % CPU, 1.8 GB RAM.

Deep tuning: 600,000 QPS, 18 ms latency, 45 % CPU, 1.5 GB RAM.

Extreme tuning: >1,200,000 QPS, 8 ms latency, 35 % CPU, 1.2 GB RAM.

Stage 1 – Basic Configuration

1.1 Worker Process Tuning

# Set worker processes based on CPU cores
worker_processes auto;
# Bind workers to specific cores to avoid migration overhead
worker_cpu_affinity auto;
# Max connections per worker
events {
    worker_connections 65535;
    use epoll;               # Linux event model
    multi_accept on;         # Accept multiple connections at once
}

On a 32‑core server, worker_processes auto yields a 15 % performance gain over manually setting 32 workers because Nginx accounts for NUMA topology.

1.2 TCP Connection Optimisation

http {
    # Reduce small‑packet latency
    tcp_nodelay on;
    # Improve transmission efficiency
    tcp_nopush on;
    # Enable persistent connections
    keepalive_timeout 65;
    keepalive_requests 10000;
    # Request body limits
    client_max_body_size 20m;
    client_body_buffer_size 128k;
    # Header buffers
    client_header_buffer_size 4k;
    large_client_header_buffers 8 8k;
}

Stage 2 – Kernel Parameter Tuning

2.1 System‑Level Tweaks

# /etc/sysctl.conf
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_max_syn_backlog = 65535

# TCP reuse and timeout
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 10

# TCP buffer sizes
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864

# File descriptor limit
fs.file-max = 6815744

2.2 User‑Level Limits

# /etc/security/limits.conf
nginx soft nofile 655350
nginx hard nofile 655350
nginx soft nproc 655350
nginx hard nproc 655350

Remember to also adjust the systemd service file:

[Service]
LimitNOFILE=655350
LimitNPROC=655350

Stage 3 – Cache and Compression

3.1 Static File Cache Policy

location ~* \.(jpg|jpeg|png|gif|ico|css|js|pdf|txt)$ {
    expires 1y;
    add_header Cache-Control "public, immutable";
    add_header Pragma "cache";
    # Enable gzip static files
    gzip_static on;
    # Disable access log for static assets
    access_log off;
    # Zero‑copy sendfile
    sendfile on;
    sendfile_max_chunk 1m;
}

3.2 Dynamic Compression

# Gzip settings
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss application/atom+xml;

# Brotli (requires compiled module)
brotli on;
brotli_comp_level 6;
brotli_types text/plain text/css application/json application/javascript;

Enabling Brotli reduces transferred data by ~25 % and improves page load speed by ~35 %.

Stage 4 – Advanced Optimisation Techniques

4.1 Upstream Connection Pool

upstream backend {
    least_conn;                     # Load‑balancing algorithm
    server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;
    keepalive 300;
    keepalive_requests 1000;
    keepalive_timeout 60s;
}

server {
    location / {
        proxy_pass http://backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_buffering on;
        proxy_buffer_size 128k;
        proxy_buffers 8 128k;
        proxy_busy_buffers_size 256k;
        proxy_connect_timeout 5s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;
    }
}

4.2 SSL/TLS Optimisation

# SSL settings
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;

# Session cache
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;

# OCSP stapling
ssl_stapling on;
ssl_stapling_verify on;

# Buffer size
ssl_buffer_size 4k;

# Hardware acceleration (if supported)
ssl_engine qat;

4.3 Memory Pool Optimisation

# Connection pool size
connection_pool_size 512;
# Request pool size
request_pool_size 8k;
# Large header buffers (requires kernel support)
large_client_header_buffers 8 16k;
# Reduce allocation frequency
proxy_temp_file_write_size 256k;
proxy_temp_path /var/cache/nginx/proxy_temp levels=1:2 keys_zone=temp:10m;

Stage 5 – Monitoring and Continuous Tuning

5.1 Key Metrics Monitoring

# Enable stub_status module
location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
}

Example Bash script that pushes metrics to a StatsD endpoint:

#!/bin/bash
# nginx_monitor.sh
curl -s http://localhost/nginx_status | awk '\
/Active connections/ {print "active_connections " $3}\
/accepts/ {print "accepts " $1; print "handled " $2; print "requests " $3}\
/Reading/ {print "reading " $2; print "writing " $4; print "waiting " $6}' \
| while read metric value; do
    echo "nginx.${metric}:${value}|g" | nc -u localhost 8125
done

5.2 Performance Benchmarking

# wrk load test
wrk -t32 -c1000 -d60s --latency http://your-domain.com/

# ApacheBench comparison
ab -n 100000 -c 1000 http://your-domain.com/

# Custom Lua script with wrk
wrk -t32 -c1000 -d60s -s post.lua http://your-domain.com/api

Extreme Optimisation: Breaking the Million‑QPS Barrier

6.1 Kernel Bypass with DPDK

# Compile Nginx with DPDK support
./configure --with-dpdk=/path/to/dpdk

# Bind network queues to specific CPUs
echo 2 > /proc/irq/24/smp_affinity
echo 4 > /proc/irq/25/smp_affinity

6.2 JIT Compilation (OpenResty LuaJIT)

location /api {
    content_by_lua_block {
        ngx.header.content_type = "application/json"
        ngx.say('{"status": "ok"}')
    }
}

6.3 Zero‑Copy and Asynchronous I/O

# Enable splice system call
splice on;

# Asynchronous I/O
aio threads;
aio_write on;

# Direct I/O for large files
directio 4m;
directio_alignment 512;

Real‑World Case: E‑Commerce Flash‑Sale System

During a Double‑11 flash‑sale, the target peak was 1.5 M QPS with < 50 ms latency and 99.99 % availability. The following specialised configuration was applied:

# Seckill‑specific upstream
upstream seckill_backend {
    hash $remote_addr consistent;
    server 10.0.1.10:8080 weight=3 max_conns=3000;
    server 10.0.1.11:8080 weight=3 max_conns=3000;
    server 10.0.1.12:8080 weight=4 max_conns=4000;
    keepalive 1000;
}

# Rate limiting
limit_req_zone $binary_remote_addr zone=seckill:100m rate=100r/s;
limit_conn_zone $binary_remote_addr zone=conn_seckill:100m;

server {
    location /seckill {
        limit_req zone=seckill burst=200 nodelay;
        limit_conn conn_seckill 10;
        proxy_cache seckill_cache;
        proxy_cache_valid 200 302 5s;
        proxy_cache_valid 404 1m;
        proxy_connect_timeout 1s;
        proxy_send_timeout 2s;
        proxy_read_timeout 2s;
        proxy_pass http://seckill_backend;
    }
}

The system sustained 1.68 M QPS with an average response time of 32 ms, meeting the business SLA.

Optimization Checklist

Basic Optimisation

Set worker processes to CPU core count.

Enable epoll event model.

Increase worker_connections.

Fine‑tune keep‑alive settings.

Adjust buffer sizes appropriately.

System Optimisation

Apply kernel TCP and memory tweaks.

Raise file‑descriptor limits.

Optimise TCP parameters.

Enable transparent huge pages.

Advanced Optimisation

Configure upstream connection pools.

Enable gzip and Brotli compression.

Hardening SSL/TLS parameters.

Implement intelligent cache strategies.

Deploy CDN acceleration where applicable.

Monitoring Optimisation

Set up performance metrics collection.

Define alerting rules.

Establish performance baselines.

Run regular stress tests.

Analyse access logs for anomalies.

Common Pitfalls & Solutions

Pitfall 1: Too Many worker_processes

Result: Excessive context switches and degraded performance. Solution: Use worker_processes auto so Nginx determines the optimal number.

Pitfall 2: Ignoring Upstream Connection Reuse

Result: High backend connection overhead. Solution: Properly configure keepalive and related parameters.

Pitfall 3: SSL Handshake Overhead

Result: HTTPS noticeably slower than HTTP. Solution: Enable session caching, OCSP stapling, and hardware acceleration.

Pitfall 4: Logging Becomes a Bottleneck

Result: Disk I/O spikes. Solution: Use asynchronous logging or disable unnecessary access logs.

Future Trends

HTTP/3 & QUIC Support

# Experimental HTTP/3 enablement
listen 443 quic reuseport;
listen 443 ssl http2;
add_header Alt‑Svc 'h3=":443"; ma=86400';

Edge Computing Integration

With 5G and edge‑node deployments, Nginx is extending to edge locations to provide ultra‑low latency services.

AI‑Driven Smart Optimisation

Future Nginx versions may embed machine‑learning models that automatically adjust configuration parameters based on real‑time traffic patterns.

Conclusion

Systematic Nginx optimisation can lift throughput from 100 k QPS to the million‑level range. The key steps are layered tuning—from basic worker settings to kernel tweaks, from caching/compression to advanced connection‑pool and SSL hardening, combined with continuous monitoring and stress testing. Treat performance optimisation as an ongoing process that evolves with business needs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationNginx
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.