Backend Development 21 min read

How to Debug Rare Core Dumps in High‑Concurrency Nginx: Tools & Strategies

This article shares a detailed post‑mortem of debugging extremely rare core dumps and memory leaks in a high‑concurrency Nginx HTTPS implementation, covering root‑cause analysis, custom stress‑test frameworks, and the use of tools such as gdb, valgrind, AddressSanitizer, perf and flame graphs to locate and fix the issues.

21CTO

Jun 18, 2016

How to Debug Rare Core Dumps in High‑Concurrency Nginx: Tools & Strategies

Project Background

We performed deep modifications to Nginx's event framework and the OpenSSL protocol stack to boost HTTPS full‑handshake performance. The original single‑core RSA computation handled only about 400 qps, limiting scalability even with 24 cores.

Observed Problems

Extremely low‑probability core dumps (≈1 in 10⁹) appeared during high‑load tests, often clustering at specific times.

Severe memory leaks manifested when concurrency exceeded ten thousand qps, consuming ~500 MiB per hour.

Difficulty identifying performance hotspots and bottlenecks after the code changes.

Core Dump Debugging

Initial attempts with gdb and debug logs proved ineffective because Nginx uses a multi‑process, fully asynchronous event model, making the call stack fragmented across separate read/write events.

Locate core dumps with gdb and btrace, but recognize that many crashes stem from NULL pointer dereferences.

Apply defensive checks (if pointer is NULL, return) to stop immediate crashes, while acknowledging that this merely masks the underlying issue.

Realize that asynchronous event programming hides the origin of NULL values, as the logical flow is split across multiple callbacks.

Defensive programming that simply returns on NULL may prevent a crash but can still affect user requests; a core dump at the NULL dereference is actually a responsible signal.

Improving Logging

Enabling full DEBUG logging eliminates the bug because the massive I/O overhead reduces QPS dramatically. Alternative approaches included per‑IP debug logging, high‑severity custom logs, and sampling specific connections, yet none provided sufficient insight.

Constructing a Reliable Stress‑Test System

To reproduce the bug consistently, we built a distributed testing framework using wrk (multi‑threaded, asynchronous HTTP load generator) and coordinated multiple client machines to generate tens of thousands of QPS.

We also crafted abnormal request scenarios:

Randomly close TCP sockets during the connect phase.

Randomly abort SSL handshakes at the client‑hello or client‑key‑exchange stages.

Send HTTPS requests encrypted with an incorrect public key, forcing decryption failures.

Memory‑Leak Detection Tools

We evaluated valgrind , which requires no recompilation but slows execution by 10‑50×, making it unsuitable for high‑load leak detection. AddressSanitizer (ASan) offers a faster alternative (≈2× slowdown) but needs recompilation with -fsanitize=address. Using clang with appropriate flags allowed us to catch the leak without sacrificing test throughput.

Performance Hotspot Analysis

We compared several profiling utilities:

perf : the most comprehensive kernel‑level profiler, supporting flame‑graph generation.

oprofile : older, less convenient, largely superseded by perf.

gprof : application‑level profiler requiring recompilation.

systemtap : powerful dynamic tracing framework for complex cases.

Using perf we recorded samples and generated flame graphs to visualize function‑level CPU consumption. The following command chain produces a flame graph for Nginx under an ECDHE‑RSA cipher suite:

perf record -F 99 -p PID -g -- sleep 10 perf script | ./stackcollapse-perf.pl > out.perf-folded ./flamegraph.pl out.perf-folded > out.svg

Flame graph showing RSA functions dominating CPU

The flame graph revealed that rsaz_1024_mul_avx2 and rsaz_1024_sqr_avx2 consumed 75 % of samples, guiding our optimization efforts.

Optimized Nginx performance after async proxy offload

Mindset

Debugging such elusive bugs is a valuable learning opportunity. Treat each crash as a chance to deepen tool expertise, discuss openly with teammates, and maintain a positive, persistent attitude—even when the process feels exhausting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

debugging performance High concurrency perf valgrind core dump asan

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.