Backend Development 17 min read

Unlock Nginx Performance: Proven Strategies to Boost Throughput and Concurrency

This article presents a comprehensive guide to Nginx performance tuning, covering methodology, request lifecycle, application‑level tweaks, and system‑level optimizations to achieve higher concurrency, lower latency, and better resource utilization.

Efficient Ops
Efficient Ops
Efficient Ops
Unlock Nginx Performance: Proven Strategies to Boost Throughput and Concurrency
Speaker: Tao Hui, former Huawei and Tencent data‑infrastructure engineer, author of *Deep Understanding of Nginx: Module Development and Architecture Analysis*, now CTO and co‑founder at Zhilianda, focusing on applying internet technology to transform the construction industry.

Today's talk focuses on systematic thinking about Nginx performance to help engineers improve efficiency.

1. Optimization Methodology

The presentation addresses two key problems:

Maintaining a high number of concurrent connections while using memory efficiently.

Ensuring high throughput under high concurrency.

Implementation focuses on three layers: application, framework, and kernel.

Hardware considerations include NIC speed (10G/40G), storage type (SSD vs HDD), and especially CPU performance.

Techniques such as reuseport, fastsocket, and coroutine‑based OpenResty reduce context‑switch costs and improve CPU utilization.

2. The Life Cycle of a Request

Understanding the request flow clarifies where to optimize.

Nginx modules form a processing pipeline; each request passes through a sequence of modules in the core (e.g., stream and NGX).

2.1 Request Arrival

When a new connection is accepted, the kernel places it in a queue, epoll waits for events, and Nginx allocates a connection memory pool that is released only when the connection closes.

A 60‑second timer is set to close idle connections; a read buffer of about 1 KB is allocated for the request body.

2.2 Request Parsing

The URI and headers are read into a request memory pool (default 4 KB). If the request is larger, Nginx expands the pool in 8 KB increments, keeping pointers to parsed data without freeing them until the request ends.

After header parsing, Nginx proceeds through 11 processing phases (post‑read, rewrite, access, etc.).

2.3 Reverse Proxy

For slow client connections, Nginx buffers the entire request (default 8 KB) before opening an upstream connection, reducing upstream load but increasing memory usage.

2.4 Response Generation

The response passes through header, write, postpone, and copy filters. OpenResty directives and SDK hooks can be inserted at appropriate stages.

3. Application‑Layer Optimizations

3.1 Protocol

Improving HTTP/2, header compression, and other protocol features can significantly boost performance, though they may introduce trade‑offs with security.

3.2 Compression

Balancing dynamic and static compression affects CPU load; keepalive connections also influence throughput due to slow‑start behavior.

3.4 Rate Limiting

Limiting the response rate to clients (not upstream) helps control bandwidth consumption and smooths traffic spikes.

3.5 Worker Load Balancing

Disabling the inter‑process lock can increase throughput but may cause uneven worker utilization; enabling the “requests” directive lets the kernel balance connections more evenly.

3.6 Timeouts

Nginx uses a red‑black tree to manage timers; proper timeout settings improve TCP resource reuse and reduce half‑open connections.

3.7 Caching

Spatial and temporal caching strategies (e.g., pre‑fetching adjacent data blocks) can reduce disk I/O and improve hit rates.

3.8 Reducing Disk I/O

Techniques such as sendfile zero‑copy, AIO, SSD usage, and thread‑pooled file reads can yield up to 9× performance gains.

4. System‑Level Optimizations

Key areas include increasing capacity limits, CPU cache affinity, NUMA‑aware memory placement, fast TCP recovery, and tuning kernel parameters such as TCP_DEFER_ACCEPT.

Optimizing TCP parameters (initial window size, retransmission timers) and leveraging multi‑queue NICs further improve throughput.

Memory allocation speed, PCRE version, and kernel‑level tweaks complete the performance checklist.

This article is compiled from Tao Hui’s presentation at GOPS 2018 Shanghai.
Backendperformance optimizationConcurrencyCachingNginxsystem tuning
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.