Why Swap Stays Full on Linux and How to Release 29 GB Quickly
After noticing 14 GB free RAM but a fully used 29 GB swap, the article explains how to diagnose hidden swap consumption using tools like smem and swap_stat_show, clarifies the difference between free’s Swap used and /proc/*/status VmSwap, and shows how adjusting transparent_hugepage defrag and toggling swap can reclaim the space.
This article is part of a "Performance Optimization in Practice" series that focuses on real‑world production issues, following an earlier "Linux Performance Tuning" series.
Problem Statement
The free -wh command shows 14 GB of free physical memory while the swap partition of 29 GB is completely used:
# free -wh
total used free shared buffers cache available
Mem: 125G 28G 14G 34G 352M 81G 79G
Swap: 29G 29G 0BPrevious checks with iostat, top, and vmstat indicated no abnormal resource consumption, and all applications appeared to run normally.
Hypothesis
When the system experiences high memory pressure, many pages are swapped out. After the pressure subsides, those pages may not be reclaimed promptly, possibly due to a process‑related issue. Therefore, we need to identify which processes are actually using swap.
Tools for Investigation
Two utilities are used:
smem – a memory reporting tool.
swap_stat_show – a Python script that reports per‑process swap usage.
Running swap_stat_show shows no process occupying swap.
Running smem -p -s swap confirms that the swap column for all processes is 0.00%:
# smem -p -s swap
PID User Command Swap USS PSS RSS
1 root /sbin/init 0.00% 0.00% 0.00% 0.00%
889 root /lib/systemd/systemd-journal 0.00% 0.01% 0.06% 0.11%
... (remaining output omitted, all swap values are 0.00%)Thus, processes only occupy a negligible amount of swap, yet the system reports 29 GB of used swap. We need to understand the discrepancy between the free command’s “Swap used” value and the per‑process VmSwap field found in /proc/*/status.
Key Definitions
free (Swap used) : Shows how many data pages are marked as “used” in the swap area. It includes all pages that have ever been written to swap and have not yet been overwritten.
Process VmSwap (from /proc/*/status) : Shows how many pages of a specific process currently reside only in swap. Pages that exist simultaneously in RAM and swap are not counted here.
Reasoning Process
The system once experienced severe memory pressure. With 125 GB of RAM exhausted, the kernel swapped out a large amount of inactive pages (up to 29 GB) to the swap partition, which free reports as used.
Later, as the load decreased, the kernel read those pages back into RAM, turning them into page cache (explaining the 81 GB cache). For each process, the VmSwap value drops to near zero because the data now resides in RAM.
However, the kernel does not erase the old copies from the swap area. It keeps them as a “swap cache”. The free command (and /proc/meminfo) still counts these cached pages as used swap until they are overwritten by new swapped‑out data.
To verify this behavior, we inspect the transparent hugepage defragmentation setting, which can affect how memory is reclaimed.
# cat /sys/kernel/mm/transparent_hugepage/defrag
always defer defer+madvise [madvise] neveralways : When the system cannot allocate a transparent huge page, it pauses allocation and waits for memory reclamation and compaction before proceeding. defer : Falls back to regular 4 KB pages, wakes kswapd and kcompactd for background reclamation and compaction, and later merges pages into huge pages when possible. madvise : Applies the “always” behavior only to memory regions explicitly marked with MADV_HUGEPAGE via madvise() . defer+madvise : Combines the “defer” fallback with “always” for MADV_HUGEPAGE regions. never : Disables transparent hugepage defragmentation.
Setting the defrag mode to defer can help the kernel release swap cache more promptly.
# Temporary change
echo defer > /sys/kernel/mm/transparent_hugepage/defrag
# Permanent change (add to /etc/default/grub and run update-grub)
GRUB_CMDLINE_LINUX_DEFAULT="transparent_hugepage=defer"To immediately free the 29 GB of swap, you can temporarily turn swap off and on again. This operation should be performed during a low‑traffic window to minimize impact.
sudo swapoff -a
sudo swapon -aBoth smem and swap_stat_show are open‑source tools that can be obtained from their respective repositories.
Tech Stroll Journey
The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
