How to Diagnose and Fix Memory Leaks in a Containerized Image Thumbnail Service
This guide walks through a systematic, step‑by‑step process for identifying, analyzing, and resolving memory‑related incidents in a high‑traffic thumbnail generation service running in Kubernetes, covering everything from initial symptom checks with free and vmstat to deep dives using smem, pmap, smaps, perf, and post‑mortem verification.
Incident Overview
A new version of an image thumbnail service caused a rapid increase in P99 latency, a drop in MemAvailable, and intermittent OOMKilled events within 90 minutes of deployment. The post‑mortem focuses on a systematic memory‑diagnostic workflow using four key tools: free – assess overall system memory health. smem – separate private ( USS / PSS) from shared memory. pmap / smaps – identify which mapping types (anonymous, file‑backed, shared) are growing. perf – locate allocation hotspots in the call stack.
Step‑by‑Step Diagnostic Procedure
1. Confirm System‑Wide Memory Pressure
Run free -h and verify that available continuously declines while buff/cache stays stable. If available is steady, the issue is likely cache‑related, not a leak.
Check kernel pressure signals: cat /proc/pressure/memory A rising some and full average indicates genuine memory pressure.
2. Align Node and Pod Views
kubectl top pod -n media
kubectl get events -n media --sort-by=.lastTimestampCorrelate pod‑level OOMKilled events with node‑wide MemAvailable trends.
3. Distinguish Page‑Cache from Real Leak
If used is high but available remains stable, suspect page cache.
If available falls and buff/cache does not grow, proceed to process‑level analysis.
4. Identify the Real Memory Consumer with smem
smem -P thumbnail-svc -c "pid pss uss rss" | head -20Focus on processes where USS (or PSS) is large and increasing. Sample repeatedly (e.g., every 30 s for several minutes) to ensure the growth is sustained, not a transient spike.
5. Drill Down into Mapping Types
For the suspect PID, inspect anonymous and dirty pages: pmap -x $PID | tail -20 Look for lines with [ anon ] and high Dirty values – these indicate private anonymous pages.
Aggregate smaps fields to confirm:
grep -E '^(Size|Rss|Pss|Private_Dirty|Anonymous):' /proc/$PID/smaps |\
awk '{sum[$1]+=$2} END {for (k in sum) printf "%s=%dKB
", k, sum[k]}'Dominant Anonymous and Private_Dirty values point to a true leak in user‑space allocations.
6. Pinpoint Allocation Hotspots with perf
# Quick view
perf top -p $PID -g --call-graph dwarf
# Record a short session (e.g., 30 s)
perf record -F 99 -g -p $PID -- sleep 30
perf report --stdio | head -80Typical hotspots such as allocate_decode_buffer and tiff_decode_tiles in libimgdecode.so indicate the native decoder is allocating memory without releasing it on error paths.
7. Root‑Cause Confirmation
The leak originates from a new libimgdecode.so version that fails to free intermediate buffers when decoding large TIFF images. Each worker accumulates anonymous private dirty pages, causing per‑container OOMKilled events and node‑wide memory exhaustion.
8. Immediate Mitigation
Roll back the deployment to the previous image.
Throttle large‑image requests (reduce concurrency, limit dimensions).
Drain the most affected node while keeping one instance for continued evidence collection.
Preserve a failing pod to keep perf and smaps data.
9. Post‑Mitigation Validation
Verify free -h shows stable MemAvailable for >1 hour.
Confirm smem USS/PSS no longer shows monotonic growth.
Run the high‑resolution image workload and ensure no OOMKilled events.
Check that perf no longer reports allocation hotspots.
Automation Scripts
Several helper scripts are provided to standardize evidence collection (e.g., mem_scene_collect.sh, smaps_rollup.py, pmap_growth_diff.sh, oom_evidence_pack.sh, cgroup_mem_snapshot.sh, perf_capture_prepare.sh). They capture free, vmstat, smem, pmap, smaps, kernel logs, cgroup metrics, and short perf recordings.
Best Practices & Pitfalls
Never rely solely on RSS; always examine PSS / USS for true private usage.
High buff/cache does not guarantee safety – check available and pressure metrics.
In container environments, map container PID to host PID before using pmap or perf.
When cgroup v2 is enabled, monitor memory.current and memory.events alongside pod metrics.
Limit perf recording duration in production to avoid overhead.
Monitoring Recommendations
Node layer: alert if MemAvailable < 8 % of total for 10 min.
Pod layer: alert on any OOMKilled event within 15 min.
Process layer: periodically sample USS/PSS; flag monotonic increase.
Business layer: correlate P99 latency spikes with large‑image request ratio.
Summary
The workflow—starting with system‑wide health ( free, vmstat), narrowing to process‑level private memory ( smem), dissecting mapping types ( pmap / smaps), and finally pinpointing allocation hotspots ( perf)—enables rapid discrimination between true memory leaks, cache effects, and shared‑page artifacts. Applying the mitigation steps, validating with repeat measurements, and embedding the recommended alerts creates a repeatable on‑call process that prevents recurrence.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
