Operations 17 min read

How Memory Leaks Sneak Into Your System and How to Stop Them

This article explains why memory leaks act like invisible thieves that gradually fill the RSS space, outlines their four‑step attack process, shows how to spot the tell‑tale signs using process‑level and system‑level metrics, and provides practical emergency and preventive measures to protect your applications.

NiuNiu MaTe
NiuNiu MaTe
NiuNiu MaTe
How Memory Leaks Sneak Into Your System and How to Stop Them

Memory as a Warehouse

In an operating system, memory works like a high‑speed warehouse where every running program temporarily stores data. Two main shelves exist: the shared page cache (public temporary storage) and the private RSS (resident set size) that belongs exclusively to each process.

Why the Private Shelf Is the Perfect Target

The page cache is constantly cleaned by the OS, so a leak rarely hides there. RSS, however, is never reclaimed by the system, allowing leaked objects to accumulate unchecked.

Four‑Step Leak Attack

Stealthy Observation : The leak monitors program start‑up, learning task cycles and when memory is released.

Initial Hoarding : Small amounts of unused objects are left in RSS instead of being freed.

Mass Accumulation : As workload grows, more garbage piles up, causing a steep rise in RSS usage.

Out‑of‑Memory Collapse : When RSS is exhausted, the OS invokes the OOM killer, terminating the offending process.

Detecting the Leak

Three key evidence points help confirm a leak:

RSS only increases : Monitor with Task Manager, top, or ps – a healthy process shows a “tidal” pattern, while a leaky one shows a monotonic climb.

Garbage‑collector churn : In Java, Python, etc., frequent Full GC with diminishing reclaimed memory indicates that most of the heap is occupied by useless objects.

Workload‑memory mismatch : Identical tasks consume progressively more memory (e.g., 100 MB → 300 MB for the same number of orders), revealing hidden accumulation.

System‑level signals include shrinking free memory, rising swap usage, and OOM‑killer logs (e.g., dmesg | grep -i oom).

Emergency Fixes

When a process becomes unresponsive, the quickest remedy is to restart it, which clears the RSS entirely. Before restarting, capture a memory snapshot (e.g., jmap for Java, tracemalloc for Python) for later analysis. If restart is too disruptive, temporarily increase the process’s memory limit (e.g., raise -Xmx).

Daily Inspection

Set up two monitoring layers:

Track RSS growth (e.g., Prometheus node_process_rss) and alert on >50% weekly increase.

Periodically export memory snapshots and audit for unusually large object counts (e.g., millions of OrderDTO instances).

Preventive Measures

To stop leaks at the source:

Adopt “use‑and‑release” patterns (try‑with‑resources, explicit close calls) for file handles, sockets, etc.

Configure caches with size limits and TTLs to avoid unbounded growth.

Regularly update and prune third‑party dependencies that may contain hidden leaks.

By monitoring, reacting quickly, and enforcing disciplined resource management, you can keep memory leaks from turning into catastrophic outages.

Monitoringresource managementOOM killerpage cacheRSS
NiuNiu MaTe
Written by

NiuNiu MaTe

Joined Tencent (nicknamed "Goose Factory") through campus recruitment at a second‑tier university. Career path: Tencent → foreign firm → ByteDance → Tencent. Started as an interviewer at the foreign firm and hopes to help others.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.