Design Decisions Behind NGINX’s High Performance and Scalability
NGINX achieves top‑tier performance and scalability through a multi‑process architecture that limits worker processes to one per CPU core, employs single‑threaded non‑blocking workers handling many connections via an event‑driven state machine, and isolates privileged tasks in a master process.
Owen Garrett, Product Director at NGINX, published a blog post on the official NGINX site explaining the design decisions that give the NGINX product its world‑class performance and scalability.
The overall architecture of NGINX consists of a set of cooperating processes:
Master process: Performs privileged operations such as reading configuration files, binding sockets, and creating or signalling worker processes.
Worker processes: Receive and handle connection requests, read/write to disk, and communicate with upstream servers. When NGINX is active, only the workers are busy.
Cache loader process: Loads the disk cache into memory at startup and then exits.
Cache manager process: Maintains the disk cache data to keep it within bounds, running intermittently.
NGINX’s high performance and scalability hinge on two fundamental design choices:
Minimising the number of worker processes to reduce context‑switch overhead. The default and recommended configuration assigns one worker per CPU core, efficiently utilising hardware resources.
Worker processes are single‑threaded and handle multiple concurrent connections in a non‑blocking manner.
Each worker process uses a state machine to manage multiple connections in a non‑blocking fashion:
Each worker handles several sockets, which may be listening sockets or connection sockets.
When a listening socket receives a new request, a new connection socket is opened to communicate with the client.
When an event arrives on a connection socket, the worker quickly completes the response and immediately moves on to handle events on other sockets.
Garrett notes that this design fundamentally differentiates NGINX from traditional web servers, which typically allocate a separate thread per connection. While thread‑per‑connection models simplify programming, they incur heavy context‑switch costs because threads spend most of their time blocked waiting for I/O. When the number of concurrent I/O operations or threads exceeds a threshold, or memory becomes scarce, the overhead becomes significant.
Conversely, NGINX’s design ensures that worker processes do not block network traffic unless there is no work to do. Each new connection consumes only minimal resources—a file descriptor and a small amount of worker memory.
In summary, after system tuning, each NGINX worker process can handle hundreds to thousands of concurrent HTTP connections.
Disclaimer: The content originates from publicly available internet sources. The views expressed are neutral and provided for reference and discussion only. Copyright belongs to the original author or organization; please contact us for removal if any infringement occurs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
