Understanding Hadoop YARN Schedulers: FIFO, Capacity, and Fair Scheduler
This article explains the role of YARN's Scheduler, compares FIFO, Capacity, and Fair schedulers, details their configurations—including XML snippets for Capacity and Fair schedulers, queue hierarchy, preemption settings, and provides practical guidance for resource allocation in Hadoop clusters.
In Hadoop YARN, the Scheduler allocates resources to applications and offers several pluggable strategies.
Three main schedulers are available: FIFO Scheduler, which processes applications in submission order without configuration; Capacity Scheduler, suitable for shared clusters by defining hierarchical queues and allowing both large and small jobs to obtain resources; and Fair Scheduler, which dynamically shares resources among jobs and can also support FIFO within queues.
The article presents a comparison diagram (not shown) illustrating how FIFO can cause small tasks to be blocked by large ones, while Capacity uses dedicated queues that may limit large job performance, and Fair Scheduler balances resources dynamically, though it may introduce a delay for newly submitted tasks.
2. Capacity Scheduler Configuration
The Capacity Scheduler enables multiple organizations to share a cluster by assigning each a portion of resources through hierarchical queues. Queues can be defined with percentages of parent resources, maximum resource limits, and user limits. Example XML configuration snippets are provided:
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>a,b,c</value>
<description>The queues at this level (root is the root queue).</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.queues</name>
<value>a1,a2</value>
<description>The queues at this level (root is the root queue).</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.queues</name>
<value>b1,b2,b3</value>
<description>The queues at this level (root is the root queue).</description>
</property>Queues are defined using comma‑separated lists, and each sub‑queue can specify its share of the parent’s resources. Administrators can refresh the scheduler configuration dynamically.
3. Fair Scheduler
The Fair Scheduler aims to allocate resources evenly among all applications, with fairness tunable via weight attributes. Its configuration resides in fair-scheduler.xml, whose location can be set with yarn.scheduler.fair.allocation.file. If the file is absent, a queue is created per user on first submission.
Queue hierarchy is expressed with nested <queue> elements, and each queue can have its own weight, minimum/maximum resources, and scheduling policy (default is fair, but FIFO can be specified per queue). Example configuration:
<allocations>
<queue name="sample_queue">
<minResources>10000 mb,0vcores</minResources>
<maxResources>90000 mb,0vcores</maxResources>
<maxRunningApps>50</maxRunningApps>
<maxAMShare>0.1</maxAMShare>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<queue name="sample_sub_queue">
<aclSubmitApps>charlie</aclSubmitApps>
<minResources>5000 mb,0vcores</minResources>
</queue>
<queue name="sample_reservable_queue">
<reservation/>
</queue>
</queue>
<queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
<queueMaxResourcesDefault>40000 mb,0vcores</queueMaxResourcesDefault>
<queue name="secondary_group_queue" type="parent">
<weight>3.0</weight>
<maxChildResources>4096 mb,4vcores</maxChildResources>
</queue>
<user name="sample_user">
<maxRunningApps>30</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
<queuePlacementPolicy>
<rule name="specified"/>
<rule name="primaryGroup" create="false"/>
<rule name="nestedUserQueue">
<rule name="secondaryGroupExistingQueue" create="false"/>
</rule>
<rule name="default" queue="sample_queue"/>
</queuePlacementPolicy>
</allocations>Queue placement rules determine how jobs are assigned to queues, falling back to a default queue when no specific rule matches. Administrators can disable automatic user‑queue creation with yarn.scheduler.fair.user-as-default-queue=false and prevent undeclared queues with yarn.scheduler.fair.allow-undeclared-pools=false.
3.2 Preemption
Fair Scheduler supports preemption to improve job start predictability. When a job is submitted to a busy cluster, it may wait until resources are freed; preemption can kill containers that exceed their entitled share, reallocating them to under‑served queues. Preemption is enabled via yarn.scheduler.fair.preemption=true and controlled by timeout settings such as <defaultMinSharePreemptionTimeout>, <minSharePreemptionTimeout>, <defaultFairSharePreemptionTimeout>, and <fairSharePreemptionTimeout>, as well as thresholds like <defaultFairSharePreemptionThreshold>.
References include Hadoop Capacity Scheduler documentation and articles on dynamic container resource adjustment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
