How SysOM Uncovers Hidden Memory Usage in Cloud‑Native Environments
In cloud‑native deployments, container abstraction hides memory consumption, leading to high file cache, SReclaimable, cgroup leaks, and invisible kernel‑allocated memory, but SysOM’s non‑intrusive, low‑overhead diagnostics map pages to inodes and containers to pinpoint the root causes quickly.
Background
In cloud‑native environments containers hide many memory allocations from traditional process‑level metrics. Hidden consumption (file page cache, kernel reclaimable caches, stale cgroup entries, GPU/DMA buffers) can cause memory pressure, latency spikes, and OOM events that are hard to detect with log‑based or manual node‑by‑node analysis.
Pain Points
File cache overload – Excessive page cache delays I/O responses and influences Kubernetes scheduling decisions.
SReclaimable growth – Kernel‑maintained reclaimable caches consume physical RAM without appearing in container‑level metrics, leading to mis‑judged memory pressure.
cgroup leakage – Stale cgroup directories after rapid pod turnover occupy kernel memory and corrupt monitoring data.
Invisible memory – GPU drivers, NICs and RDMA allocate memory invisible to tools like top, increasing OOM risk for AI training and other high‑performance workloads.
Solution Overview (SysOM 2.0)
SysOM 2.0 provides a unified, non‑intrusive diagnostic capability that scans the host, container runtime and application processes to produce a full‑stack memory view. It identifies abnormal patterns without modifying business code, dramatically improving issue discovery and root‑cause analysis efficiency.
Technical Approach
For the most common hidden‑memory scenario—excessive file cache—the solution performs two key steps:
Map a memory page to its inode by reading page->mapping and index, then locating the corresponding address_space and file inode.
Reconstruct the full file path by traversing the dentry cache within the mount namespace (e.g., /data/model/xxx.bin).
To avoid the heavy cost of full‑memory scans, SysOM leverages eBPF with BTF (BPF Type Format) to dynamically obtain structure offsets across kernel versions. This enables a lightweight sampling strategy that captures only active cache pages. The tool also reads /proc/kpageflags and /proc/kpagecgroup to correlate pages with containers and workloads.
Implementation Details
Four candidate approaches were evaluated:
Kernel driver (ko) – Simple to implement but highly intrusive, requires kernel‑specific adaptation and carries a risk of system crash.
eBPF – Safe, compatible across kernels, but lacks a robust looping mechanism for full scans.
mincore system call – Works only on open files; closed files cannot be inspected.
kcore – Provides raw memory dump for full‑page‑cache scanning, but lacks data‑structure metadata and incurs high CPU cost on large systems.
The final design combines kcore for raw memory extraction with eBPF‑BTF for version‑agnostic parsing. Remaining challenges include:
Raw kcore provides no structural information, requiring manual offset calculation.
Full‑memory traversal is CPU‑intensive on machines with large RAM.
Support is needed for both host‑wide and container‑level cache scans.
Case Studies
Case 1 – Container WorkingSet Spike
SysOM identified a pod whose WorkingSet memory kept rising. The diagnostic traced the page‑cache usage to log files mounted from the host ( /var/log), which consumed ~228 MiB of cache. Recommendation: optimise log writes or limit cache growth.
Case 2 – Shared‑Memory Leak
The tool discovered ~34 GiB of shared memory held by many small files under /dev/shm/ganglia/*. Deleting these files instantly reclaimed memory, confirming a leak in the application’s shared‑memory handling.
Remediation Guidance
Release reclaimable caches manually: echo 1 > /proc/sys/vm/drop_caches Delete non‑essential files that generate large caches (e.g., excessive log files, stale shared‑memory files).
Enable container‑level memory QoS (e.g., ACK cluster memory QoS) to enforce cache limits.
Future Roadmap
SysOM will integrate large‑model reasoning with lightweight inference to provide early anomaly detection, root‑cause suggestions and cross‑platform management. Enhanced kernel‑level observability will fill current blind spots, moving operations from reactive response to proactive control.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
