Operations 45 min read

Master Linux Memory Troubleshooting: Detect Leaks and High Usage Efficiently

This guide walks Linux operators through the fundamentals of memory management, explains key metrics, and provides step‑by‑step instructions for using tools like top, free, vmstat, pmap, valgrind, smem, smaps and slabtop to pinpoint and resolve memory leaks and excessive memory consumption in production systems.

Deepin Linux
Deepin Linux
Deepin Linux
Master Linux Memory Troubleshooting: Detect Leaks and High Usage Efficiently

1. Linux Memory Basics

Before diving into troubleshooting, it is essential to understand how Linux manages memory, both virtually and physically, to interpret the metrics correctly.

1.1 Memory Management Mechanism

Linux separates virtual memory (per‑process address space) from physical memory (actual RAM). The page table acts as a translation dictionary, mapping virtual pages to physical frames only when accessed, which improves efficiency.

1.2 Key System Memory Metrics

MemTotal : total physical RAM installed.

MemFree : currently unused RAM.

MemAvailable : estimate of memory readily available for new processes, accounting for reclaimable cache.

Buffers : cache for block‑device I/O.

Cached : page cache for file data.

When MemAvailable drops near zero, the system is at risk of OOM.

1.3 Causes of Memory Leaks and High Usage

Forgotten free / delete after malloc / new.

Frequent dynamic allocations leading to fragmentation.

Poor data‑structure choices with high space complexity.

Uncleared caches or buffers.

Contention in high‑concurrency environments.

Third‑party libraries with faulty memory handling.

2. Identifying Memory Hogs

2.1 Process‑Level Inspection

top : Press M to sort by memory (%MEM). Important columns:

RES : resident memory actually used.

% MEM : proportion of total RAM.

pmap : Shows detailed memory map of a PID. Common options: pmap <pid>: basic map. pmap -x <pid>: includes RSS, Dirty, Swap. pmap -d <pid>: device and inode info. pmap -q <pid>: silent output for scripting.

htop : Enhanced UI, supports mouse sorting and tree view (F5).

smem : Provides USS, PSS, RSS per process. Example output:

PID   User   Swap   USS   PSS   RSS   Total Size
1234  root   0B     120M  300M  350M  470M

free : Quick snapshot of overall memory. Example:

total        used        free      shared  buff/cache   available
Mem:   7.7G       2.3G        1.2G       400M        4.2G        4.9G
Swap:  2.0G          0B        2.0G

vmstat : Shows system‑wide statistics. Example command vmstat 2 10 produces:

procs ---------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so   bi   bo   in   cs us sy id wa st
2  0      0 200000  50000 700000   0    0   10   20  100  200  5  3 88  4  0
3  0      0 180000  45000 680000   0    0   12   22  105  210  6  3 87  4  0
...

2.2 Memory‑Leak Detection

valgrind (Memcheck) : Detects leaks, invalid reads/writes, and use‑after‑free. Install via package manager, compile with -g, then run:

valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all ./myapp

Sample output:

==12345== HEAP SUMMARY:
==12345==     in use at exit: 128 bytes in 2 blocks
==12345==   total heap usage: 5 allocs, 3 frees, 2048 bytes allocated
==12345== 64 bytes in 1 blocks are definitely lost
==12345==    at 0x4C2B0E0: malloc (vgpreload_memcheck-amd64-linux.so)
==12345==    by 0x4005BD: main (test.c:10)

GCC Sanitizers : -fsanitize=address (ASan) – detects out‑of‑bounds, use‑after‑free, and leaks (LSan runs automatically). -fsanitize=memory (MSan) – detects use of uninitialized memory (requires recompiling all dependencies).

strace : Trace memory‑related syscalls. Example:

strace -p 5678 -e trace=mmap,munmap,brk

3. Precise Root‑Cause Location

3.1 Using valgrind for Deep Inspection

After confirming a leak with basic tools, run valgrind on the target binary compiled with -g. The detailed report pinpoints the source file and line number of each leak.

3.2 memstat for Detailed Usage

memstat, built from source, reports VSS, RSS, SHR, USS for each process and can test memory stability by writing/reading patterned data.

3.3 Analyzing /proc/&lt;pid&gt;/smaps

smaps provides per‑segment details (Size, Rss, Pss, Private_Dirty, etc.). A four‑step workflow:

Identify the suspect PID with top or ps.

Save a baseline: cat /proc/5678/smaps > smaps_1.txt.

After a monitoring interval, repeat: cat /proc/5678/smaps > smaps_2.txt.

Diff the two files; a growing Private_Dirty in a [heap] segment indicates a heap leak.

Example script (saved as monitor_smaps.sh) captures the relevant fields every minute:

#!/bin/bash
PID=1234
INTERVAL=60
SNAPSHOTS_DIR="./smaps_snapshots"
mkdir -p $SNAPSHOTS_DIR
while true; do
  TIMESTAMP=$(date +%s)
  cat /proc/$PID/smaps | grep -E "Size|Private_Dirty|^[0-9a-fA-F]+-[0-9a-fA-F]+ [rwxp-]{4} [0-9a-fA-F]+ [0-9]{2}:[0-9]{2} [0-9]+ \[anon\]" > $SNAPSHOTS_DIR/smaps_$TIMESTAMP.txt
  sleep $INTERVAL
done

3.4 Kernel‑Space Tools: slabtop and Friends

slabtop visualizes kernel slab caches. Important columns:

Cache : slab name (e.g., dentry, inode_cache).

Num_objs : total objects allocated.

Active_objs : objects currently in use.

Inactive_objs : free objects.

%Usage : Active/Num ratio.

Rapid growth of Num_objs with low Inactive_objs often signals a kernel memory leak.

Other kernel tools: cat /proc/slabinfo – static snapshot for scripting. crash – analyzes kernel core dumps. SystemTap and dtrace – dynamic probes for complex kernel‑level leaks.

4. Practical Case Study

A web service on an 8 GB server showed degraded response time. Initial free -h output:

total        used        free      shared  buff/cache   available
Mem:   7.7G       6.8G        0.2G       100M        0.7G        0.5G
Swap:  2.0G          0B        2.0G

Repeated vmstat 2 10 showed decreasing free memory and a growing run‑queue, indicating memory pressure.

Using ps aux --sort=-%mem | head -10 identified an apache2 process (PID 1234) consuming 25 % of RAM (RSS ≈ 2 GB).

Detailed map with pmap -x 1234 revealed a large [anon] region (≈ 1.9 GB RSS) that kept growing.

A monitoring script captured /proc/1234/smaps snapshots; the Private_Dirty field rose from 100 MB to 500 MB, confirming a heap leak.

Code review uncovered a request‑handling module that allocated temporary buffers without freeing them. After fixing the deallocation and redeploying, memory usage stabilized and service performance recovered.

Recommended preventive measures:

Regular code reviews focusing on allocation/release.

Comprehensive stress testing (e.g., JMeter) before production.

Continuous memory monitoring with alerts (Prometheus + Grafana).

Adopt memory‑pool techniques to reduce fragmentation.

5. Common Pitfalls and Tips

Do not mistake high buff/cache for a leak; it is normal caching.

Remember that kernel memory (slab) is invisible to user‑process RSS; use slabtop to investigate.

Prefer RSS or PSS over %MEM when sorting processes, as %MEM is based on virtual size.

In container environments, inspect cgroup files (e.g., /sys/fs/cgroup/memory/.../memory.usage_in_bytes) instead of host free.

Performance MonitoringMemory LeakSystem AdministrationDebugging Tools
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.