Big Data 13 min read

Understanding Hadoop YARN Schedulers: FIFO, Capacity, and Fair Scheduler

This article explains the role of YARN's Scheduler, compares FIFO, Capacity, and Fair schedulers, details their configurations—including XML snippets for Capacity and Fair schedulers, queue hierarchy, preemption settings, and provides practical guidance for resource allocation in Hadoop clusters.

Big Data Technology & Architecture

Dec 20, 2019

Understanding Hadoop YARN Schedulers: FIFO, Capacity, and Fair Scheduler

In Hadoop YARN, the Scheduler allocates resources to applications and offers several pluggable strategies.

Three main schedulers are available: FIFO Scheduler, which processes applications in submission order without configuration; Capacity Scheduler, suitable for shared clusters by defining hierarchical queues and allowing both large and small jobs to obtain resources; and Fair Scheduler, which dynamically shares resources among jobs and can also support FIFO within queues.

The article presents a comparison diagram (not shown) illustrating how FIFO can cause small tasks to be blocked by large ones, while Capacity uses dedicated queues that may limit large job performance, and Fair Scheduler balances resources dynamically, though it may introduce a delay for newly submitted tasks.

2. Capacity Scheduler Configuration

The Capacity Scheduler enables multiple organizations to share a cluster by assigning each a portion of resources through hierarchical queues. Queues can be defined with percentages of parent resources, maximum resource limits, and user limits. Example XML configuration snippets are provided:

<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>a,b,c</value>
  <description>The queues at this level (root is the root queue).</description>
</property>

<property>
  <name>yarn.scheduler.capacity.root.a.queues</name>
  <value>a1,a2</value>
  <description>The queues at this level (root is the root queue).</description>
</property>

<property>
  <name>yarn.scheduler.capacity.root.b.queues</name>
  <value>b1,b2,b3</value>
  <description>The queues at this level (root is the root queue).</description>
</property>

Queues are defined using comma‑separated lists, and each sub‑queue can specify its share of the parent’s resources. Administrators can refresh the scheduler configuration dynamically.

3. Fair Scheduler

The Fair Scheduler aims to allocate resources evenly among all applications, with fairness tunable via weight attributes. Its configuration resides in fair-scheduler.xml, whose location can be set with yarn.scheduler.fair.allocation.file. If the file is absent, a queue is created per user on first submission.

Queue hierarchy is expressed with nested <queue> elements, and each queue can have its own weight, minimum/maximum resources, and scheduling policy (default is fair, but FIFO can be specified per queue). Example configuration:

<allocations>
  <queue name="sample_queue">
    <minResources>10000 mb,0vcores</minResources>
    <maxResources>90000 mb,0vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <maxAMShare>0.1</maxAMShare>
    <weight>2.0</weight>
    <schedulingPolicy>fair</schedulingPolicy>
    <queue name="sample_sub_queue">
      <aclSubmitApps>charlie</aclSubmitApps>
      <minResources>5000 mb,0vcores</minResources>
    </queue>
    <queue name="sample_reservable_queue">
      <reservation/>
    </queue>
  </queue>
  <queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
  <queueMaxResourcesDefault>40000 mb,0vcores</queueMaxResourcesDefault>
  <queue name="secondary_group_queue" type="parent">
    <weight>3.0</weight>
    <maxChildResources>4096 mb,4vcores</maxChildResources>
  </queue>
  <user name="sample_user">
    <maxRunningApps>30</maxRunningApps>
  </user>
  <userMaxAppsDefault>5</userMaxAppsDefault>
  <queuePlacementPolicy>
    <rule name="specified"/>
    <rule name="primaryGroup" create="false"/>
    <rule name="nestedUserQueue">
      <rule name="secondaryGroupExistingQueue" create="false"/>
    </rule>
    <rule name="default" queue="sample_queue"/>
  </queuePlacementPolicy>
</allocations>

Queue placement rules determine how jobs are assigned to queues, falling back to a default queue when no specific rule matches. Administrators can disable automatic user‑queue creation with yarn.scheduler.fair.user-as-default-queue=false and prevent undeclared queues with yarn.scheduler.fair.allow-undeclared-pools=false.

3.2 Preemption

Fair Scheduler supports preemption to improve job start predictability. When a job is submitted to a busy cluster, it may wait until resources are freed; preemption can kill containers that exceed their entitled share, reallocating them to under‑served queues. Preemption is enabled via yarn.scheduler.fair.preemption=true and controlled by timeout settings such as <defaultMinSharePreemptionTimeout>, <minSharePreemptionTimeout>, <defaultFairSharePreemptionTimeout>, and <fairSharePreemptionTimeout>, as well as thresholds like <defaultFairSharePreemptionThreshold>.

References include Hadoop Capacity Scheduler documentation and articles on dynamic container resource adjustment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Scheduler YARN preemption Fair Scheduler Capacity Scheduler

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.