Backend Development 14 min read

How a Hidden Compression Bomb Triggered OOM Crashes in an Nginx Data Gateway

A special request caused memory usage to spike dramatically, leading to an OOM‑killer termination of an Nginx‑based data‑collection gateway, and the investigation uncovered a compression‑bomb style payload and coarse memory‑pool allocation as the root causes.

Alibaba Cloud Developer

Oct 30, 2024

How a Hidden Compression Bomb Triggered OOM Crashes in an Nginx Data Gateway

Problem Background

The data‑collection gateway, built on Nginx, started experiencing occasional worker process crashes. Although the master process would restart the workers, the crashes were traced to out‑of‑memory (OOM) conditions.

Initial Analysis

Memory usage on the host appeared stable (around 40% of RAM), but minute‑level metrics missed short spikes. Core‑dump files were not being generated because the OOM‑killer sends SIGKILL, which prevents core dumps.

When you hit a dead end in a maze, you must reconsider the previous steps.

Discovering the OOM Trigger

Second‑level monitoring revealed that, at the crash moment, a worker’s memory jumped from a few hundred MB to over 10 GB within seconds, causing the kernel to kill the process.

To obtain a core‑dump, a user‑space helper was added that monitors worker memory and, when a threshold is exceeded, sends SIGABORT to force a dump.

Memory‑Pool Investigation

The gateway processes data in stages: request reception, processing, batching, and sending. Each batch creates a memory pool (≈3 MB) that is released only after the HTTP request is fully sent.

Under normal load, the pool is quickly freed, keeping memory usage low. However, if many batch‑write requests are created faster than they can be sent, memory pools accumulate.

Signal      Standard   Action   Comment
───────────────────────────────────────
SIGIOT      -          Core     IOT trap (synonym for SIGABRT)
SIGKILL     P1990      Term     Kill signal
SIGLOST     -          Term     File lock lost (unused)
...

Root Cause: Compression Bomb

Testing with mock data showed that a payload of 10 000 identical events (≈34.7 MB uncompressed) compressed to only 1.2 MB, a 3.5× compression ratio. The gateway’s 4 MB body limit applied to the compressed payload, allowing a massive number of events to pass.

origin data bytes: 34697723
compressed data bytes: 1214252

Each batch of ~25.6 KB triggered a write request; a 32 MB payload therefore generated about 1 250 write requests. With three parallel output channels, the memory demand reached >10 GB.

Solution

Added limits on the number of schema events per request.

Released raw data memory immediately after compression, keeping only the compressed payload.

Made memory‑pool allocation dynamic instead of a fixed 3 MB per request.

Considered rewriting critical components in a memory‑safe language such as Rust.

Conclusion

After months of hypothesis, testing, and verification, the “time‑bomb” was eliminated. The case highlights the importance of fine‑grained memory management, proper payload size checks, and thorough monitoring in high‑throughput backend services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Memory Management core dump backend debugging compression bomb

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.