Deploying Apache Flink on YARN and Running Flink Jobs
This tutorial explains how to deploy Apache Flink on a Hadoop YARN cluster, covering both YARN session mode and direct job submission, and demonstrates running the built‑in WordCount example with command‑line options for input, output, and resource configuration.
Apache Flink is an efficient, distributed, Java/Scala‑based general‑purpose big‑data analysis engine that supports batch and stream processing.
According to the official documentation, Flink can be deployed in three modes: Local, Cluster, and Cloud.
This article shows how to deploy Flink on YARN (based on Flink 1.0.0 and Hadoop 2.2.0) and describes two ways to start a Flink job on YARN: launching a long‑running YARN session and submitting a job directly.
Flink YARN Session
In session mode a YARN session is started, which launches a JobManager and one or more TaskManagers. The ./bin/yarn-session.sh script is used; the table below lists its most important options (e.g., -n,--container for the number of TaskManagers, -tm,--taskManagerMemory for memory, -nm,--name for the application name, etc.).
Example command to start a session with four TaskManagers, each with 8 GB memory and eight slots: ./bin/yarn-session.sh -n 4 -tm 8192 -s 8 After the session starts, the configuration file conf/flink-config.yaml can be edited as needed.
To run a Flink job, use the ./bin/flink script. The “run” action accepts options such as -c,--class to specify the entry class, -p,--parallelism to set parallelism, -d,--detached to run detached, etc.
Example using the built‑in WordCount program:
hadoop fs -copyFromLocal LICENSE hdfs://user/iteblog/ ./bin/flink run ./examples/batch/WordCount.jar --input hdfs://user/iteblog/LICENSEThe job prints word counts to the console; the output can be redirected to HDFS with the --output option:
./bin/flink run ./examples/batch/WordCount.jar \ --input hdfs://user/iteblog/LICENSE \ --output hdfs://user/iteblog/result.txtNote that --input and --output are parameters defined by the WordCount program, not Flink core options, and HDFS URIs must include the scheme.
Run a Single Flink Job on YARN
It is also possible to submit a job directly to YARN without a pre‑started session, e.g.:
./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar \ --input hdfs://user/iteblog/LICENSE \ --output hdfs://user/iteblog/result.txtThe -yn flag specifies the number of TaskManagers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
