Operations 9 min read

Memory Leak Detection and Performance Hotspot Analysis for High‑Concurrency Nginx Testing

The article details how to identify and resolve memory leaks and performance bottlenecks in high‑concurrency Nginx workloads using tools such as Valgrind, AddressSanitizer, perf and flame‑graphs, while also sharing practical tips and personal reflections on debugging under pressure.

Tencent Architect
Tencent Architect
Tencent Architect
Memory Leak Detection and Performance Hotspot Analysis for High‑Concurrency Nginx Testing

This article continues from the previous post on high‑concurrency performance testing, focusing on a memory‑leak issue that manifested as roughly 500 MB per hour during stress tests.

Valgrind is introduced as a powerful, no‑recompile memory‑error detector that can catch uninitialized reads, out‑of‑bounds accesses, double frees, etc. For Nginx, the author recommends using --trace-children=yes to follow forked worker processes, adding -DPURIFY to silence false positives from OpenSSL’s rand , and limiting the number of worker processes to avoid cluttered logs. However, Valgrind’s memcheck can degrade performance by 10‑50×, making it unsuitable for detecting leaks under heavy load.

AddressSanitizer (ASan) is presented as a faster alternative, reducing overhead to about 2×. ASan requires recompiling the program with the -fsanitize=address flag (e.g., using Clang: --with-cc=clang \ --with-cc-opt="-g -fPIC -fsanitize=address -fno-omit-frame-pointer" ). The author notes that ASan’s low impact allowed detection of the same memory‑leak issue during large‑scale tests, though the specific leak was tied to custom OpenSSL error‑handling code.

After eliminating core dumps and memory‑leak risks, the focus shifts to performance hotspot analysis. The author lists several Linux profiling tools:

perf : a comprehensive, kernel‑provided profiler capable of recording and visualizing hotspots.

oprofile : an older tool largely superseded by perf.

gprof : source‑level profiler requiring recompilation, useful for fine‑grained function‑level metrics.

systemtap : a powerful dynamic tracing framework for complex performance problems.

To make profiling results more intuitive, the article introduces flame graphs . By running:

perf record -F 99 -p PID -g -- sleep 10
perf script | ./stackcollapse-perf.pl > out.perf-folded
./flamegraph.pl out.perf-folded > out.svg

the author generated a flame graph that highlighted rsaz_1024_mul_avx2 and rsaz_1024_sqr_avx2 as consuming roughly 75 % of sampled cycles, guiding further optimization efforts.

Two illustrative images are included to show the raw flame‑graph and the post‑optimization view where RSA‑related hotspots have disappeared.

Finally, the author shares a personal “mindset” section, emphasizing that debugging is a valuable learning opportunity, encouraging open discussion of bugs, and reflecting on the mental challenges faced during the three‑week effort to resolve core dumps and memory leaks.

Performance TestingMemory LeakFlame GraphNginxperfValgrindaddress-sanitizer
Tencent Architect
Written by

Tencent Architect

We share insights on storage, computing, networking and explore leading industry technologies together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.