Features, Configuration Parameters, and Implementation Details of Hadoop Capacity Scheduler
The article provides a comprehensive overview of Hadoop's Capacity Scheduler, describing its resource‑allocation features, configurable XML parameters, queue access controls, dynamic configuration updates, and the internal workflow of application initialization and resource scheduling within YARN.
Features
The Capacity Scheduler allocates resources by queue, allowing each queue to define a minimum guaranteed capacity and a usage limit, while also permitting per‑user limits to prevent abuse. Unused resources in a queue can be temporarily shared with other queues.
Capacity guarantee: Administrators set minimum guarantees and maximum limits for each queue; all applications submitted to the queue share these resources.
Flexibility: Surplus resources in a queue can be temporarily offered to other queues that need them, and are reclaimed when the original queue receives new applications.
Multi‑tenancy: Supports multiple users sharing a cluster and running multiple applications simultaneously, with optional constraints to avoid monopolization.
Security guarantee: Each queue has an ACL defining allowed users; users can be granted view or control permissions (e.g., killing applications). Administrators can also designate queue and cluster administrators.
Dynamic configuration: Administrators can modify configuration parameters on‑line to manage the cluster.
Capacity Scheduler Functions
The scheduler uses a configuration file capacity-scheduler.xml located in the conf directory.
Queue parameters are referenced as yarn.scheduler.capacity.<queueName>.<parameter> .
Key resource‑allocation parameters include:
capacity : Minimum resource capacity of a queue (percentage). The sum of all queues should be less than 100%.
maximum-capacity : Upper bound of resource usage for a queue.
minimum-user-limit-percent : Minimum guaranteed resources per user (percentage).
user-limit-factor : Maximum share of resources a user may consume (percentage).
Application‑count limits:
maximum-applications : Maximum number of applications (running or waiting) in the cluster or a queue. Default is 10000. Set via yarn.scheduler.capacity.maximum-applications for the cluster or yarn.scheduler.capacity.<queue-path>.maximum-applications for a specific queue.
maximum-am-resource-percent : Upper limit of resources allocated to ApplicationMasters. Configured cluster‑wide with yarn.scheduler.capacity.maximum-am-resource-percent or per‑queue with yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent .
Queue access control parameters:
state : Queue state, either STOPPED or RUNNING . A stopped queue rejects new submissions but allows running applications to finish.
acl_submit_application : Specifies which users or groups may submit applications to the queue; permissions are inherited by child queues.
acl_administer_queue : Designates a queue administrator who can manage all applications in the queue, including killing them; also inherited.
To apply dynamic changes, edit conf/capacity-scheduler.xml and run yarn rmadmin -refreshQueues .
The scheduler does not support dynamically reducing the number of queues, and all updated configuration values must be valid or the file will fail to load.
Implementation Details
Application Initialization
When an application is submitted to the ResourceManager, a SchedulerEventType.APP_ADDED event is sent to the Capacity Scheduler. The scheduler creates a FiCaSchedulerApp object, performs validity checks, and places the application into the appropriate leaf queue.
User must have submit permission for the leaf queue.
The queue and all its ancestors must be in RUNNING state.
The number of applications already submitted to the queue must be below the configured limit.
The number of applications submitted by the user must be below the user‑level limit.
Resource Scheduling
Upon receiving a SchedulerEventType.NODE_UPDATE from a NodeManager heartbeat, the Capacity Scheduler processes two types of information:
Newly launched containers: the scheduler sends a RMContainerEventType.LAUNCHED to the ResourceManager and removes the container from the timeout queue.
Completed containers: the ResourceManager reclaims their resources for future allocation.
After handling these, idle resources on the node are allocated to applications following three steps:
Queue selection: Starting from the root, queues are traversed in order of increasing resource‑usage ratio (used resources / minimum capacity). The first leaf queue with available capacity is chosen.
Application selection: Within the selected leaf queue, applications are ordered by submission time (earlier Application ID first), and the earliest application is chosen.
Container request selection: For the chosen application, the scheduler prefers higher‑priority container requests, and for equal priority prefers node‑local, then rack‑local, then any‑local containers.
Containers contain five pieces of information: priority, preferred node, resource amount, number of containers, and whether locality relaxation is allowed.
The scheduler uses a resource comparator to compare resource usage. The default DefaultResourceCalculator considers only memory, while the DominantResourceCalculator applies the DRF algorithm considering both memory and CPU. The comparator can be set via yarn.scheduler.capacity.resource-calculator.
APP_REMOVED: Triggered when an application finishes or is killed; the scheduler first kills all unfinished containers and then removes the application data structures.
NODE_ADDED: When a new node joins the cluster, the scheduler records the NodeManager information and increases total cluster resources.
NODE_REMOVED: When a node leaves, the scheduler removes its information, reduces total resources, and kills any running containers on that node.
CONTAINER_EXPIRED: If an allocated container is not used within a timeout, the scheduler receives this event and reclaims the container.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
