Big Data 4 min read

Comparison of Flink and Spark in Standalone and YARN Deployment Modes

This article compares Apache Flink and Apache Spark in both standalone and YARN deployment modes, detailing their architecture, job scheduling differences, and specific configurations such as Flink’s yarn‑cluster and yarn‑session modes versus Spark’s yarn‑client and yarn‑cluster modes.

Big Data Technology & Architecture

Jun 19, 2020

Comparison of Flink and Spark in Standalone and YARN Deployment Modes

This article analyzes and compares the deployment and execution models of Apache Flink and Apache Spark in both standalone and YARN (on‑YARN) modes.

Standalone Mode

Both Flink and Spark support a standalone deployment that does not rely on external resource managers; each starts its own master/slave cluster to schedule and run applications.

Flink

Spark

On‑YARN Mode

Flink on YARN wraps the JobManager inside an ApplicationMaster (AM). The AM creates the execution graph, distributes tasks, and handles results. The YarnTaskManager extends the regular TaskManager to run the actual tasks.

Spark on YARN can run in two modes: yarn‑client and yarn‑cluster, which differ mainly in where the driver process executes.

Flink on YARN

yarn‑cluster mode: Flink submits the application to YARN as a single job (similar to a MapReduce job); the job finishes and the application terminates.

yarn‑session mode: First a long‑running empty application is submitted to YARN. After it starts, the ApplicationMaster, JobManager, and N YarnTaskManager containers are launched, but no tasks run yet. Other Flink clients can later submit jobs to this JobManager by specifying its ApplicationId.

Spark on YARN

The main difference between yarn‑client and yarn‑cluster modes is where the driver runs.

在yarn-client模式下，driver及业务代码逻辑运行在yarn client进程中，与applicationMaster及executor交互完成应用的调度和执行。
在Yarn-cluster模式下，应用提交至Yarn集群后，yarn client进程可以退出，driver及业务代码逻辑运行在applicationMaster进程中，与executor完成应用的调度执行。

Yarn‑client

Yarn‑cluster

Below is a visual comparison of the core functions of Flink and Spark when running on YARN.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Comparison YARN Spark Standalone

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.