Big Data 8 min read

Deploying Apache Flink on YARN and Running Flink Jobs

This tutorial explains how to deploy Apache Flink on a Hadoop YARN cluster, covering both YARN session mode and direct job submission, and demonstrates running the built‑in WordCount example with command‑line options for input, output, and resource configuration.

Big Data Technology & Architecture

Jan 3, 2019

Deploying Apache Flink on YARN and Running Flink Jobs

Apache Flink is an efficient, distributed, Java/Scala‑based general‑purpose big‑data analysis engine that supports batch and stream processing.

According to the official documentation, Flink can be deployed in three modes: Local, Cluster, and Cloud.

This article shows how to deploy Flink on YARN (based on Flink 1.0.0 and Hadoop 2.2.0) and describes two ways to start a Flink job on YARN: launching a long‑running YARN session and submitting a job directly.

Flink YARN Session

In session mode a YARN session is started, which launches a JobManager and one or more TaskManagers. The ./bin/yarn-session.sh script is used; the table below lists its most important options (e.g., -n,--container for the number of TaskManagers, -tm,--taskManagerMemory for memory, -nm,--name for the application name, etc.).

Example command to start a session with four TaskManagers, each with 8 GB memory and eight slots: ./bin/yarn-session.sh -n 4 -tm 8192 -s 8 After the session starts, the configuration file conf/flink-config.yaml can be edited as needed.

To run a Flink job, use the ./bin/flink script. The “run” action accepts options such as -c,--class to specify the entry class, -p,--parallelism to set parallelism, -d,--detached to run detached, etc.

Example using the built‑in WordCount program:

hadoop fs -copyFromLocal LICENSE hdfs://user/iteblog/

./bin/flink run ./examples/batch/WordCount.jar --input hdfs://user/iteblog/LICENSE

The job prints word counts to the console; the output can be redirected to HDFS with the --output option:

./bin/flink run ./examples/batch/WordCount.jar \

--input hdfs://user/iteblog/LICENSE \

--output hdfs://user/iteblog/result.txt

Note that --input and --output are parameters defined by the WordCount program, not Flink core options, and HDFS URIs must include the scheme.

Run a Single Flink Job on YARN

It is also possible to submit a job directly to YARN without a pre‑started session, e.g.:

./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar \

--input hdfs://user/iteblog/LICENSE \

--output hdfs://user/iteblog/result.txt

The -yn flag specifies the number of TaskManagers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Apache Flink command-line YARN Hadoop WordCount Flink Deployment

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.