Operations 10 min read

Why Swap Stays Full on Linux and How to Release 29 GB Quickly

After noticing 14 GB free RAM but a fully used 29 GB swap, the article explains how to diagnose hidden swap consumption using tools like smem and swap_stat_show, clarifies the difference between free’s Swap used and /proc/*/status VmSwap, and shows how adjusting transparent_hugepage defrag and toggling swap can reclaim the space.

Tech Stroll Journey
Tech Stroll Journey
Tech Stroll Journey
Why Swap Stays Full on Linux and How to Release 29 GB Quickly

This article is part of a "Performance Optimization in Practice" series that focuses on real‑world production issues, following an earlier "Linux Performance Tuning" series.

Problem Statement

The free -wh command shows 14 GB of free physical memory while the swap partition of 29 GB is completely used:

# free -wh
               total        used        free      shared     buffers       cache   available
Mem:           125G         28G         14G         34G        352M        81G        79G
Swap:           29G         29G          0B

Previous checks with iostat, top, and vmstat indicated no abnormal resource consumption, and all applications appeared to run normally.

Hypothesis

When the system experiences high memory pressure, many pages are swapped out. After the pressure subsides, those pages may not be reclaimed promptly, possibly due to a process‑related issue. Therefore, we need to identify which processes are actually using swap.

Tools for Investigation

Two utilities are used:

smem – a memory reporting tool.

swap_stat_show – a Python script that reports per‑process swap usage.

Running swap_stat_show shows no process occupying swap.

Running smem -p -s swap confirms that the swap column for all processes is 0.00%:

# smem -p -s swap
  PID User     Command                               Swap      USS      PSS      RSS
    1 root     /sbin/init                            0.00%    0.00%    0.00%    0.00%
  889 root     /lib/systemd/systemd-journal          0.00%    0.01%    0.06%    0.11%
  ... (remaining output omitted, all swap values are 0.00%)

Thus, processes only occupy a negligible amount of swap, yet the system reports 29 GB of used swap. We need to understand the discrepancy between the free command’s “Swap used” value and the per‑process VmSwap field found in /proc/*/status.

Key Definitions

free (Swap used) : Shows how many data pages are marked as “used” in the swap area. It includes all pages that have ever been written to swap and have not yet been overwritten.

Process VmSwap (from /proc/*/status) : Shows how many pages of a specific process currently reside only in swap. Pages that exist simultaneously in RAM and swap are not counted here.

Reasoning Process

The system once experienced severe memory pressure. With 125 GB of RAM exhausted, the kernel swapped out a large amount of inactive pages (up to 29 GB) to the swap partition, which free reports as used.

Later, as the load decreased, the kernel read those pages back into RAM, turning them into page cache (explaining the 81 GB cache). For each process, the VmSwap value drops to near zero because the data now resides in RAM.

However, the kernel does not erase the old copies from the swap area. It keeps them as a “swap cache”. The free command (and /proc/meminfo) still counts these cached pages as used swap until they are overwritten by new swapped‑out data.

To verify this behavior, we inspect the transparent hugepage defragmentation setting, which can affect how memory is reclaimed.

# cat /sys/kernel/mm/transparent_hugepage/defrag
always defer defer+madvise [madvise] never
always : When the system cannot allocate a transparent huge page, it pauses allocation and waits for memory reclamation and compaction before proceeding. defer : Falls back to regular 4 KB pages, wakes kswapd and kcompactd for background reclamation and compaction, and later merges pages into huge pages when possible. madvise : Applies the “always” behavior only to memory regions explicitly marked with MADV_HUGEPAGE via madvise() . defer+madvise : Combines the “defer” fallback with “always” for MADV_HUGEPAGE regions. never : Disables transparent hugepage defragmentation.

Setting the defrag mode to defer can help the kernel release swap cache more promptly.

# Temporary change
echo defer > /sys/kernel/mm/transparent_hugepage/defrag

# Permanent change (add to /etc/default/grub and run update-grub)
GRUB_CMDLINE_LINUX_DEFAULT="transparent_hugepage=defer"

To immediately free the 29 GB of swap, you can temporarily turn swap off and on again. This operation should be performed during a low‑traffic window to minimize impact.

sudo swapoff -a
sudo swapon -a

Both smem and swap_stat_show are open‑source tools that can be obtained from their respective repositories.

memory managementPerformance TuningLinuxSwapTransparent Hugepages
Tech Stroll Journey
Written by

Tech Stroll Journey

The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.