Mastering Linux Memory cgroups: Theory, Configuration, and Real‑World Scripts
This comprehensive guide explains the fundamentals of Linux memory cgroups, walks through their creation and configuration, details core kernel mechanisms such as OOM handling and hierarchical limits, and provides practical Bash scripts for Docker, Kubernetes, and multi‑tenant environments to help engineers reliably control memory usage.
In containerized and micro‑service architectures, unexpected OOM Killer terminations and memory contention often stem from the Linux kernel’s memory cgroup subsystem, which acts as the invisible manager of memory quotas for Docker, Kubernetes and other containers.
1. What is a memory cgroup?
Cgroups (Control Groups) are a kernel feature that isolates resources for groups of processes, similar to an apartment manager allocating utilities to tenants. The memory cgroup, a sub‑system of cgroups, focuses on fine‑grained memory control and is the core mechanism behind Docker’s --memory flag and Kubernetes pod memory limits.
The implementation resides in the kernel source tree ( linux/mm/memcontrol.c) and is referenced from each process’s task_struct via a pointer to its cgroup set.
// Simplified task_struct excerpt
struct task_struct {
/* ... */
struct css_set __rcu *cgroups; // points to the cgroup set
/* ... */
};1.1 Overview of cgroups
Cgroups provide hierarchical resource limits, statistics, and isolation for CPU, memory, I/O, and more. They originated at Google in 2006 and were merged into the Linux kernel in version 2.6.24.
1.2 Virtual file‑system control logic
Create a group by making a directory under /sys/fs/cgroup/memory (e.g., mkdir mygroup).
Set limits by writing to files such as memory.limit_in_bytes and memory.soft_limit_in_bytes.
Associate processes by writing their PID to cgroup.procs.
1.3 Core functions of memory cgroup
Resource limits : Hard limits trigger OOM Killer; soft limits allow temporary over‑commit.
Usage statistics : Files like memory.usage_in_bytes and memory.stat provide real‑time metrics.
Hierarchical inheritance : Child groups inherit limits from their parent unless overridden.
Pressure notifications : memory.pressure_level reports memory pressure to aid scheduling decisions.
2. Full process of creating a memory cgroup
2.1 Mounting the cgroup file system
Before using memory cgroups, mount the memory controller:
mount -t cgroup -o memory memory /sys/fs/cgroup/memoryThis creates a root cgroup that serves as the default template for all child groups.
2.2 Creating groups and configuring parameters
After mounting, create a sub‑directory (e.g., mkdir /sys/fs/cgroup/memory/myapp) and configure limits:
# Hard limit 1 GB
echo 1073741824 > myapp/memory.limit_in_bytes
# Soft limit 512 MB
echo 536870912 > myapp/memory.soft_limit_in_bytesProcesses can then be added with echo $PID > myapp/cgroup.procs . The cgroup.clone_children flag enables automatic inheritance for child processes.
2.3 Process association and dynamic management
Processes can be moved between groups by writing their PID to another group’s cgroup.procs . This is useful for scaling micro‑services or handling load spikes.
3. Kernel implementation details
3.1 Memory accounting and monitoring
The kernel uses struct mem_cgroup to track per‑group memory usage via a page_counter . Per‑CPU statistics are stored in vmstats_percpu , providing metrics such as RSS and page cache. Real‑time stats are exposed through memory.stat , which includes fields like active_anon and inactive_file . The historical peak is recorded in memory.max_usage_in_bytes , useful for capacity planning and leak detection.
3.2 OOM handling flow
When a request would exceed memory.limit_in_bytes , the kernel calls __memcgroup_oom_kill , selecting the highest‑memory‑using process in the group for termination. The OOM policy can be toggled via memory.oom_control (0 = enabled, 1 = disabled).
3.3 Hierarchical resource management
Child groups inherit limits from their parent unless they set stricter values. The memory.use_hierarchy flag controls whether a parent aggregates the memory usage of its children, enabling cluster‑wide visibility in Kubernetes.
4. Cooperation with other cgroup subsystems
4.1 CPU subsystem
Memory limits are often paired with CPU quotas ( cpu.cfs_quota_us ) to provide balanced isolation for compute‑intensive containers.
4.2 Device and blkio subsystems
For storage‑intensive workloads, combine blkio.throttle.read_bps_device / blkio.throttle_write_bps_device with memory limits, and restrict device access via the devices controller to create a complete sandbox.
5. Practical use cases
5.1 Docker / Kubernetes resource control
Docker’s --memory flag maps directly to memory.limit_in_bytes . In Kubernetes, resources.requests and resources.limits generate the corresponding cgroup files, ensuring pods cannot exceed their allocated memory.
5.2 Preventing memory leaks in micro‑services
By placing each service in its own memory cgroup with a strict hard limit, a leak in one service will only affect that service, not the whole node.
5.3 Multi‑tenant memory quotas
Create per‑tenant groups (e.g., /sys/fs/cgroup/memory/tenant1 ) and set memory.limit_in_bytes to enforce fair usage across users.
6. cgroup version evolution and troubleshooting
6.1 v1 vs v2 differences
Layer model : v1 uses independent hierarchies per subsystem; v2 uses a single unified hierarchy. Accounting granularity : v1 aggregates at the group level, while v2 supports mixed per‑process and per‑group statistics. OOM strategy : v1 kills based on the whole group, v2 can apply finer‑grained selection criteria. Typical deployments : Docker defaults to v1; Kubernetes recommends v2 for cloud‑native workloads.
6.2 Production configuration tips
Disable swap by setting memory.swappiness to 0 for latency‑sensitive services.
Use soft limits to allow short‑term bursts without triggering OOM.
Monitor memory.usage_in_bytes, memory.max_usage_in_bytes and memory.failcnt with Prometheus and set alerts.
Ensure the correct cgroup version is mounted; avoid mixing v1 and v2 controllers.
6.3 Common troubleshooting scripts
The following Bash scripts help diagnose typical issues such as OOM not firing, inconsistent memory statistics, and process migration failures. They check configuration files, display current usage, and optionally enable the OOM killer.
#!/bin/bash
# Example: check OOM configuration for a given cgroup
if [ "$(id -u)" -ne 0 ]; then echo "Run as root"; exit 1; fi
CGROUP=$1
BASE="/sys/fs/cgroup/memory"
PATH="$BASE/$CGROUP"
if [ ! -d "$PATH" ]; then echo "Cgroup not found"; exit 1; fi
echo "Memory limit: $(cat $PATH/memory.limit_in_bytes)"
echo "OOM control: $(cat $PATH/oom_control)"
# Enable OOM if disabled
if grep -q "oom_kill_disable 1" $PATH/oom_control; then
echo 0 > $PATH/oom_control
echo "OOM Killer re‑enabled"
fiSimilar scripts are provided for verifying memory.stat vs memory.usage_in_bytes and for safely moving a PID between groups.
Conclusion
Understanding and correctly configuring Linux memory cgroups is essential for reliable container orchestration, micro‑service stability, and multi‑tenant resource fairness. By combining hard and soft limits, hierarchical inheritance, and coordinated use of CPU, blkio, and device controllers, operators can achieve fine‑grained isolation without sacrificing performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
