Backend Development 5 min read

Design Decisions Behind NGINX’s High Performance and Scalability

NGINX achieves high performance and scalability by using a multi‑process architecture with a single master process, multiple single‑threaded non‑blocking worker processes, and dedicated cache loader/manager processes, limiting worker count per CPU core and handling many connections via an event‑driven state machine.

Art of Distributed System Architecture Design

Jun 27, 2015

Design Decisions Behind NGINX’s High Performance and Scalability

Owen Garrett, product director at NGINX, wrote a blog post explaining the design decisions that give the NGINX product top‑class performance and scalability.

The overall architecture of NGINX consists of a set of cooperating processes:

Master process: performs privileged operations such as reading configuration files, binding sockets, and creating/signalling worker processes.

Worker processes: handle incoming connections, read/write to disk, and communicate with upstream servers. When NGINX is active, only the workers are busy.

Cache loader process: loads disk cache into memory at startup and then exits.

Cache manager process: periodically reorganizes disk‑cache data to keep it within bounds.

NGINX’s high performance and scalability hinge on two fundamental design choices:

Limit the number of worker processes as much as possible, typically configuring one worker per CPU core to minimize context‑switch overhead.

Workers are single‑threaded and handle many concurrent connections in a non‑blocking manner.

Each worker process uses a state machine to manage multiple connections in a non‑blocking fashion:

A worker handles several sockets, which may be listening sockets or connection sockets.

When a listening socket receives a new request, a new connection socket is opened to communicate with the client.

When an event arrives on a connection socket, the worker quickly responds and then proceeds to handle any other newly arrived events on other sockets.

Garrett notes that this design fundamentally differentiates NGINX from typical web servers, which often assign each connection to a separate thread. Thread‑per‑connection models simplify programming but incur heavy context‑switch costs, as worker threads spend most of their time blocked waiting for I/O. When the number of concurrent I/O operations exceeds a threshold, the cost of context switching becomes significant.

Conversely, NGINX’s design ensures that workers do not block network traffic unless there is no work to do. Each new connection consumes minimal resources—only a file descriptor and a small amount of worker memory.

In summary, after system tuning, each NGINX worker can handle hundreds to thousands of concurrent HTTP connections.

Disclaimer: The content originates from public internet sources; the views expressed are neutral and for reference and discussion only. Copyright belongs to the original author or institution. If any infringement occurs, please contact for removal.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Scalability Backend Development event-driven Web Server Architecture

Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.