Operations 11 min read

Why Nginx OOMs Under Million-Connection Load and How to Fix It

During a million‑connection WebSocket stress test, four 32‑core, 128 GB Nginx servers repeatedly ran out of memory, prompting an investigation that revealed oversized proxy buffers as the root cause and showed that disabling buffering and tuning buffer sizes stabilizes memory usage.

Efficient Ops
Efficient Ops
Efficient Ops
Why Nginx OOMs Under Million-Connection Load and How to Fix It

Phenomenon Description

This is a WebSocket stress‑test environment with millions of long‑lived connections. Clients (JMeter) run on hundreds of machines, traffic passes through four Nginx instances to backend services. When idle, memory is stable; once massive send/receive starts, each of the 32 worker processes consumes nearly 4 GB, and the system repeatedly OOM‑kills them.

[Fri Mar 13 18:46:44 2020] Out of memory: Kill process 28258 (nginx) score 30 or sacrifice child
[Fri Mar 13 18:46:44 2020] Killed process 28258 (nginx) total-vm:1092198764kB, anon-rss:3943668kB, file-rss:736kB, shmem-rss:4kB

Investigation Process

Using ss -nt on both Nginx and client sides showed a large number of ESTABLISHED connections with huge Send‑Q and Recv‑Q queues. Example output:

State      Recv-Q Send-Q Local Address:Port     Peer Address:Port
ESTAB      0      792024 1.1.1.1:80               2.2.2.2:50664
...

Packet captures on the JMeter client occasionally displayed many zero‑window events, suggesting the client could not keep up.

Memory dumps were taken early in the rise using pmap -x 4199, cat /proc/4199/smaps, and gdb to dump the relevant region. The dump contained a massive amount of request/response data.

pmap -x  4199 | sort -k 3 -n -r
00007f2340539000  475240  461696  461696 rw---   [ anon ]
...

Inspecting Nginx configuration revealed an unusually large proxy_buffers setting:

location / {
    proxy_pass http://xxx;
    ...
    proxy_buffer_size        64M;
    proxy_buffers            64 16M;
    proxy_busy_buffers_size        256M;
    proxy_temp_file_write_size    512M;
}

Simulating Nginx Memory Rise

A slow‑receiving client was written in Go to mimic a bottleneck:

package main

import (
    "bufio"
    "fmt"
    "net"
    "time"
)

func main() {
    conn, _ := net.Dial("tcp", "10.211.55.10:80")
    text := "GET /demo.mp4 HTTP/1.1
Host: ya.test.me

"
    fmt.Fprintf(conn, text)
    for {
        _, _ = bufio.NewReader(conn).ReadByte()
        time.Sleep(time.Second * 3)
        println("read one byte")
    }
}

Running this program while monitoring Nginx with pidstat -p pid -r 1 1000 showed memory jumping to ~450 MB within seconds and staying high. Launching two such clients caused memory to exceed 900 MB.

Solution

Because each connection was allocated a huge buffer, the total memory grew with the number of connections. The quickest fix is to turn off buffering and reduce buffer sizes: proxy_buffering off; After applying this change and lowering proxy_buffer_size, memory stabilized around 20 GB in the stress test and only grew by about 64 MB when the test was repeated.

When buffering is enabled, Nginx stores the upstream response in buffers set by proxy_buffer_size and proxy_buffers . If the response does not fit, part of it is written to a temporary file. When buffering is disabled, Nginx forwards data to the client synchronously, limited by proxy_buffer_size .

Nginx Source Analysis

The upstream read routine resides in src/event/ngx_event_pipe.c in the function ngx_event_pipe_read_upstream. It creates temporary buffers based on p->bufs.num and p->bufs.size, which correspond to the proxy_buffers directive.

static ngx_int_t
ngx_event_pipe_read_upstream(ngx_event_pipe_t *p)
{
    for ( ;; ) {
        if (p->free_raw_bufs) {
            // ...
        } else if (p->allocated < p->bufs.num) {
            b = ngx_create_temp_buf(p->pool, p->bufs.size);
            if (b == NULL) {
                return NGX_ABORT;
            }
            p->allocated++;
        }
    }
}

Postscript

Additional diagnostics such as strace and systemtap can reveal allocation paths in black‑box programs. The test also uncovered that an unreasonable worker_connections setting can cause Nginx to consume 14 GB of memory even without massive traffic. Understanding low‑level mechanisms and proper tuning are essential for high‑concurrency deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendhigh concurrencyOOMproxy_buffering
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.