High-Concurrency on a Single Server: Nginx vs Apache and IO Models
This article examines practical high‑concurrency techniques for a single‑machine web server, presenting an ultra‑minimal reverse‑proxy model, comparing Nginx’s multi‑process event‑driven architecture with Apache’s prefork and worker MPMs, and reviewing synchronous, non‑blocking, and asynchronous I/O strategies to minimize performance killers such as data copying, context switches, memory allocation, and lock contention.
Modern large‑scale high‑performance sites such as Taobao, JD, Weibo, Facebook, and Zhihu involve complex architectures with layered business logic, modular software, distributed deployment, clustering, and load balancing. This article focuses on solving the smaller problems that arise after such decomposition, specifically which programming practices or models can achieve high concurrency on a single machine.
Minimal High‑Concurrency Model
Hardware: a single machine with 30 cores at 1 GHz.
Software: 6Wind fastpath, each core runs a run‑to‑complete endless loop without an operating system.
Function: a super‑simple reverse proxy that provides basic load‑balancing.
Performance: processing an IP packet (receive + process + send) takes about 2000 CPU cycles (~2 µs), allowing roughly 15 million requests per second on the machine. The high throughput stems from the absence of OS overhead and the simplicity of packet handling (direct C function call, ~1000 cycles).
While this model is derived from embedded systems and not directly applicable to typical web development (which relies on Linux, Nginx, frameworks, libraries, etc.), it illustrates the trade‑off between performance and the convenience provided by operating systems and open‑source components.
Four major performance killers identified are:
Data copying
Context switching
Dynamic memory allocation
Lock contention
The minimal model avoids these by keeping the CPU fully utilized.
Common Server‑Side Linux High‑Concurrency Programming Models
Nginx vs Apache
Nginx uses a multi‑process model: a master process initializes, binds sockets, and forks multiple worker processes that share the socket descriptor. Workers compete for an accept mutex and then use I/O multiplexing (select/poll/epoll/kqueue) to handle thousands of concurrent requests. The architecture is modular, event‑driven, asynchronous, single‑threaded per worker, and non‑blocking.
Apache typically runs in two modes:
Prefork MPM : a master process maintains a pool of worker processes, each handling one connection. The per‑process memory and context‑switch overhead make this mode relatively inefficient for high concurrency.
Worker MPM : the master spawns a configurable number of child processes, each creating a pool of threads to handle connections. Threads have lower overhead than processes, so this mode achieves better concurrency than prefork.
In summary, Nginx’s worker processes typically match the number of CPU cores, minimizing process switching and memory waste, whereas Apache’s per‑connection process/thread model incurs higher overhead and cache‑miss probability, limiting achievable concurrency.
IO Strategies
Because I/O speed is far slower than CPU speed, programs must decide how to handle I/O while keeping the CPU productive. The choice of I/O strategy is closely tied to the process/thread programming model.
Synchronous Blocking IO
When a socket is set to blocking mode, a read/recv call blocks the process/thread until data becomes available, suspending execution.
Synchronous Non‑Blocking IO and IO Multiplexing
If a socket has no data, a non‑blocking read returns immediately, allowing the program to continue. To handle many sockets concurrently, programs use I/O multiplexing mechanisms such as select, poll, or epoll. The program calls select (blocking) to monitor a set of sockets; when any socket becomes ready, select returns and the program can perform non‑blocking reads. Nginx uses epoll, while Apache uses select, both following the Reactor pattern.
Asynchronous Non‑Blocking IO
This model, often called the Proactor pattern, registers callbacks or handlers with the kernel. When an event occurs, the kernel invokes the corresponding handler without any blocking calls in user space.
Original link: https://segmentfault.com/a/1190000004547892
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
