Big Data 11 min read

Understanding YARN: Background, Architecture, and Execution Process

This article explains why YARN was created to overcome the limitations of MapReduce 1.x, describes its architecture—including ResourceManager, NodeManager, ApplicationMaster, Container, and Client—and outlines the step‑by‑step execution flow that enables multiple computation frameworks to run on Hadoop.

Big Data Technology & Architecture

Apr 7, 2019

Understanding YARN: Background, Architecture, and Execution Process

YARN Background YARN (Yet Another Resource Negotiator) was introduced to address several shortcomings of Hadoop MapReduce 1.x, such as a single point of failure, limited scalability of the JobTracker, and incompatibility with other processing frameworks like Spark.

MapReduce 1.x Architecture In Hadoop 1.x, the architecture follows a master‑slave model where a single JobTracker (JT) manages resources and schedules jobs, while multiple TaskTrackers (TT) report their status and execute tasks.

Problems with the 1.x Design The design suffers from a single point of failure (JT), high pressure on the JT as the cluster grows, and lack of compatibility with frameworks other than MapReduce.

YARN Overview YARN acts as a generic resource‑management layer that allows various computation frameworks to share the same HDFS cluster, providing unified resource allocation and scheduling much like an operating system.

YARN Architecture Components

ResourceManager (RM) : a single cluster‑wide manager responsible for resource allocation, job submission, and monitoring NodeManagers.

NodeManager (NM) : runs on each node, manages local resources, reports heartbeats to the RM, and launches containers on command.

ApplicationMaster (AM) : one per application (e.g., a Spark or MapReduce job) that negotiates resources with the RM and coordinates task execution.

Container : an abstract execution environment that bundles CPU, memory, and other resources for a task.

Client : submits jobs, monitors progress, and can kill jobs.

Analogy with Company Management The Client resembles a customer, the RM is the boss, NMs are department heads, the AM is a project manager, and Containers are work groups within departments.

YARN Execution Flow

The client submits a job request (MapReduce, Spark, etc.) to the RM.

The RM allocates the first container on a chosen node and instructs the corresponding NM to launch it.

The NM starts the container, inside which the ApplicationMaster runs.

The AM registers with the RM, reports its progress, and requests additional resources (memory, CPU).

After resources are granted, the AM asks the RM to launch tasks on specific NMs.

Each NM creates the required containers, runs the tasks, and reports status back to the RM.

This straightforward process, combined with a generic resource‑management layer, enables YARN to support many computation frameworks, effectively turning Hadoop into a versatile, multi‑framework platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Distributed Computing YARN Hadoop

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.