Big Data 10 min read

CPU Resource Isolation in YARN with Linux cgroups

This article introduces Linux cgroups, explains their CPU subsystem files and parameters, demonstrates how to create and configure cgroups, and details how YARN leverages cgroups for CPU resource isolation through configuration settings and code implementations, comparing soft and hard limit approaches.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
CPU Resource Isolation in YARN with Linux cgroups

In production environments, CPU‑intensive tasks often compete for NodeManager resources, causing service jitter. Since Hadoop 2.2, YARN can isolate CPU using Linux cgroups. The article first reviews the cgroup mechanism, its hierarchy, and the various subsystems (cpu, cpuacct, cpuset, memory, blkio, devices, net_cls).

The cgroup filesystem is mounted via mount -t cgroup, typically under /sys/fs/cgroup on CentOS 7. Within the CPU subsystem, creating a new cgroup (e.g., cg_test) generates files such as tasks, cpu.cfs_period_us, cpu.cfs_quota_us, cpu.shares, and cpu.stat, each controlling different aspects of CPU allocation and accounting.

An example shows running a Python infinite loop that consumes 100 % CPU, then limiting it by setting cpu.cfs_quota_us to 30000 and adding the process PID to the cgroup’s tasks file, reducing its usage to roughly 30 %.

To enable YARN CPU isolation, two properties must be added to yarn-site.xml:

yarn.nodemanager.container-executor.class: org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
yarn.nodemanager.linux-container-executor.resources-handler.class: org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler

The LinuxContainerExecutor works with CgroupsLCEResourcesHandler to create a cgroup per container and set limits based on the container’s virtual cores.

Key code from CgroupsLCEResourcesHandler.preExecute() shows how cpu.shares is calculated (default weight 1024 × vCores) and, when strict mode is enabled, how cpu.cfs_period_us and cpu.cfs_quota_us are derived from the ratio of container vCores to node vCores. The helper method getOverallLimits(float yarnProcessors) computes appropriate period and quota values, handling edge cases where the calculated values are too low.

The analysis compares the “soft limit” (cpu.shares) which offers flexible, higher overall utilization, with the “hard limit” (cfs_period_us + cfs_quota_us) that enforces strict caps but may waste CPU under low load. Example scenarios illustrate how YARN distributes CPU percentages in both normal and strict modes.

Overall, the article provides a practical guide to configuring and understanding CPU resource isolation in YARN using Linux cgroups, complete with command‑line examples and relevant Java source snippets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

linuxYARNHadoopcgroupscpu isolation
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.