CPU Resource Isolation in YARN with Linux cgroups
This article introduces Linux cgroups, explains their CPU subsystem files and parameters, demonstrates how to create and configure cgroups, and details how YARN leverages cgroups for CPU resource isolation through configuration settings and code implementations, comparing soft and hard limit approaches.
In production environments, CPU‑intensive tasks often compete for NodeManager resources, causing service jitter. Since Hadoop 2.2, YARN can isolate CPU using Linux cgroups. The article first reviews the cgroup mechanism, its hierarchy, and the various subsystems (cpu, cpuacct, cpuset, memory, blkio, devices, net_cls).
The cgroup filesystem is mounted via mount -t cgroup, typically under /sys/fs/cgroup on CentOS 7. Within the CPU subsystem, creating a new cgroup (e.g., cg_test) generates files such as tasks, cpu.cfs_period_us, cpu.cfs_quota_us, cpu.shares, and cpu.stat, each controlling different aspects of CPU allocation and accounting.
An example shows running a Python infinite loop that consumes 100 % CPU, then limiting it by setting cpu.cfs_quota_us to 30000 and adding the process PID to the cgroup’s tasks file, reducing its usage to roughly 30 %.
To enable YARN CPU isolation, two properties must be added to yarn-site.xml:
yarn.nodemanager.container-executor.class: org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
yarn.nodemanager.linux-container-executor.resources-handler.class: org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandlerThe LinuxContainerExecutor works with CgroupsLCEResourcesHandler to create a cgroup per container and set limits based on the container’s virtual cores.
Key code from CgroupsLCEResourcesHandler.preExecute() shows how cpu.shares is calculated (default weight 1024 × vCores) and, when strict mode is enabled, how cpu.cfs_period_us and cpu.cfs_quota_us are derived from the ratio of container vCores to node vCores. The helper method getOverallLimits(float yarnProcessors) computes appropriate period and quota values, handling edge cases where the calculated values are too low.
The analysis compares the “soft limit” (cpu.shares) which offers flexible, higher overall utilization, with the “hard limit” (cfs_period_us + cfs_quota_us) that enforces strict caps but may waste CPU under low load. Example scenarios illustrate how YARN distributes CPU percentages in both normal and strict modes.
Overall, the article provides a practical guide to configuring and understanding CPU resource isolation in YARN using Linux cgroups, complete with command‑line examples and relevant Java source snippets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
