Operations 60 min read

Mastering Linux Memory cgroups: Theory, Configuration, and Real‑World Scripts

This comprehensive guide explains the fundamentals of Linux memory cgroups, walks through their creation and configuration, details core kernel mechanisms such as OOM handling and hierarchical limits, and provides practical Bash scripts for Docker, Kubernetes, and multi‑tenant environments to help engineers reliably control memory usage.

Deepin Linux
Deepin Linux
Deepin Linux
Mastering Linux Memory cgroups: Theory, Configuration, and Real‑World Scripts

In containerized and micro‑service architectures, unexpected OOM Killer terminations and memory contention often stem from the Linux kernel’s memory cgroup subsystem, which acts as the invisible manager of memory quotas for Docker, Kubernetes and other containers.

1. What is a memory cgroup?

Cgroups (Control Groups) are a kernel feature that isolates resources for groups of processes, similar to an apartment manager allocating utilities to tenants. The memory cgroup, a sub‑system of cgroups, focuses on fine‑grained memory control and is the core mechanism behind Docker’s --memory flag and Kubernetes pod memory limits.

The implementation resides in the kernel source tree ( linux/mm/memcontrol.c) and is referenced from each process’s task_struct via a pointer to its cgroup set.

// Simplified task_struct excerpt
struct task_struct {
    /* ... */
    struct css_set __rcu *cgroups; // points to the cgroup set
    /* ... */
};

1.1 Overview of cgroups

Cgroups provide hierarchical resource limits, statistics, and isolation for CPU, memory, I/O, and more. They originated at Google in 2006 and were merged into the Linux kernel in version 2.6.24.

1.2 Virtual file‑system control logic

Create a group by making a directory under /sys/fs/cgroup/memory (e.g., mkdir mygroup).

Set limits by writing to files such as memory.limit_in_bytes and memory.soft_limit_in_bytes.

Associate processes by writing their PID to cgroup.procs.

1.3 Core functions of memory cgroup

Resource limits : Hard limits trigger OOM Killer; soft limits allow temporary over‑commit.

Usage statistics : Files like memory.usage_in_bytes and memory.stat provide real‑time metrics.

Hierarchical inheritance : Child groups inherit limits from their parent unless overridden.

Pressure notifications : memory.pressure_level reports memory pressure to aid scheduling decisions.

2. Full process of creating a memory cgroup

2.1 Mounting the cgroup file system

Before using memory cgroups, mount the memory controller:

mount -t cgroup -o memory memory /sys/fs/cgroup/memory

This creates a root cgroup that serves as the default template for all child groups.

2.2 Creating groups and configuring parameters

After mounting, create a sub‑directory (e.g., mkdir /sys/fs/cgroup/memory/myapp) and configure limits:

# Hard limit 1 GB
echo 1073741824 > myapp/memory.limit_in_bytes
# Soft limit 512 MB
echo 536870912 > myapp/memory.soft_limit_in_bytes

Processes can then be added with echo $PID > myapp/cgroup.procs . The cgroup.clone_children flag enables automatic inheritance for child processes.

2.3 Process association and dynamic management

Processes can be moved between groups by writing their PID to another group’s cgroup.procs . This is useful for scaling micro‑services or handling load spikes.

3. Kernel implementation details

3.1 Memory accounting and monitoring

The kernel uses struct mem_cgroup to track per‑group memory usage via a page_counter . Per‑CPU statistics are stored in vmstats_percpu , providing metrics such as RSS and page cache. Real‑time stats are exposed through memory.stat , which includes fields like active_anon and inactive_file . The historical peak is recorded in memory.max_usage_in_bytes , useful for capacity planning and leak detection.

3.2 OOM handling flow

When a request would exceed memory.limit_in_bytes , the kernel calls __memcgroup_oom_kill , selecting the highest‑memory‑using process in the group for termination. The OOM policy can be toggled via memory.oom_control (0 = enabled, 1 = disabled).

3.3 Hierarchical resource management

Child groups inherit limits from their parent unless they set stricter values. The memory.use_hierarchy flag controls whether a parent aggregates the memory usage of its children, enabling cluster‑wide visibility in Kubernetes.

4. Cooperation with other cgroup subsystems

4.1 CPU subsystem

Memory limits are often paired with CPU quotas ( cpu.cfs_quota_us ) to provide balanced isolation for compute‑intensive containers.

4.2 Device and blkio subsystems

For storage‑intensive workloads, combine blkio.throttle.read_bps_device / blkio.throttle_write_bps_device with memory limits, and restrict device access via the devices controller to create a complete sandbox.

5. Practical use cases

5.1 Docker / Kubernetes resource control

Docker’s --memory flag maps directly to memory.limit_in_bytes . In Kubernetes, resources.requests and resources.limits generate the corresponding cgroup files, ensuring pods cannot exceed their allocated memory.

5.2 Preventing memory leaks in micro‑services

By placing each service in its own memory cgroup with a strict hard limit, a leak in one service will only affect that service, not the whole node.

5.3 Multi‑tenant memory quotas

Create per‑tenant groups (e.g., /sys/fs/cgroup/memory/tenant1 ) and set memory.limit_in_bytes to enforce fair usage across users.

6. cgroup version evolution and troubleshooting

6.1 v1 vs v2 differences

Layer model : v1 uses independent hierarchies per subsystem; v2 uses a single unified hierarchy. Accounting granularity : v1 aggregates at the group level, while v2 supports mixed per‑process and per‑group statistics. OOM strategy : v1 kills based on the whole group, v2 can apply finer‑grained selection criteria. Typical deployments : Docker defaults to v1; Kubernetes recommends v2 for cloud‑native workloads.

6.2 Production configuration tips

Disable swap by setting memory.swappiness to 0 for latency‑sensitive services.

Use soft limits to allow short‑term bursts without triggering OOM.

Monitor memory.usage_in_bytes, memory.max_usage_in_bytes and memory.failcnt with Prometheus and set alerts.

Ensure the correct cgroup version is mounted; avoid mixing v1 and v2 controllers.

6.3 Common troubleshooting scripts

The following Bash scripts help diagnose typical issues such as OOM not firing, inconsistent memory statistics, and process migration failures. They check configuration files, display current usage, and optionally enable the OOM killer.

#!/bin/bash
# Example: check OOM configuration for a given cgroup
if [ "$(id -u)" -ne 0 ]; then echo "Run as root"; exit 1; fi
CGROUP=$1
BASE="/sys/fs/cgroup/memory"
PATH="$BASE/$CGROUP"
if [ ! -d "$PATH" ]; then echo "Cgroup not found"; exit 1; fi
echo "Memory limit: $(cat $PATH/memory.limit_in_bytes)"
echo "OOM control: $(cat $PATH/oom_control)"
# Enable OOM if disabled
if grep -q "oom_kill_disable 1" $PATH/oom_control; then
  echo 0 > $PATH/oom_control
  echo "OOM Killer re‑enabled"
fi

Similar scripts are provided for verifying memory.stat vs memory.usage_in_bytes and for safely moving a PID between groups.

Conclusion

Understanding and correctly configuring Linux memory cgroups is essential for reliable container orchestration, micro‑service stability, and multi‑tenant resource fairness. By combining hard and soft limits, hierarchical inheritance, and coordinated use of CPU, blkio, and device controllers, operators can achieve fine‑grained isolation without sacrificing performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxContainercgroupResource Isolation
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.