Big Data 11 min read

Deploying Apache Flink Clusters: Standalone and YARN Modes

This guide explains how to set up an Apache Flink cluster on CentOS 7 using three deployment methods—Local, Standalone, and Flink on YARN/Kubernetes—including host configuration, SSH setup, package distribution, configuration file editing, cluster start/stop commands, YARN resource manager concepts, session commands, job submission, fault‑tolerance settings, and log inspection.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Deploying Apache Flink Clusters: Standalone and YARN Modes

The article introduces three common deployment modes for Apache Flink: Local, Standalone, and Flink on YARN/Kubernetes, and provides links to introductory Flink tutorials.

1. Deployment Modes

Local

Standalone

Flink on YARN/Mesos/K8s…

2. Standalone Deployment

Using three CentOS 7 virtual machines, the master (JobManager) and two slaves (TaskManagers) are assigned IPs:

Master: 192.168.246.134
Slave: 192.168.246.135
Slave: 192.168.246.136

All machines use the root account (password 123) and are configured for password‑less SSH.

The Flink 1.7.2 binary (with Hadoop 2.8 and Scala 2.11) and JDK 8 are copied to each host:

scp flink-1.7.2-bin-hadoop28-scala_2.11.tgz [email protected]:~
scp jdk-8u11-linux-x64.tar.gz [email protected]:~
# X = 4,5,6 for the three machines

After extraction, ownership is set to root and environment variables are exported (JAVA_HOME, JRE_HOME, CLASSPATH, PATH).

The flink-conf.yaml on the master is edited to set the JobManager address, task manager memory, number of slots, and default parallelism. The slaves file on the master lists the two slave IPs.

Cluster start/stop commands on the master:

# Start the cluster
./bin/start-cluster.sh
# Stop the cluster
./bin/stop-cluster.sh

3. Flink on YARN Deployment

Key YARN components are described:

ResourceManager (RM) : global resource scheduler.

NodeManager (NM) : per‑node agent managing containers.

ApplicationMaster (AM) : negotiates resources with RM and launches Flink tasks.

The YARN deployment steps are enumerated (submit application, allocate container, register AM, request resources, launch tasks, monitor, and cleanup).

To configure Hadoop environment variables: export HADOOP_CONF_DIR=/path/to/your/hadoop YARN session usage example: bin/yarn-session.sh -h Starting a YARN session with four TaskManagers, each with 8 GB memory and 8 cores: ./bin/yarn-session.sh -n 4 -tm 8192 -s 8 Submitting a job to the YARN session:

./bin/flink run -c com.demo.wangzhiwu.WordCount $DEMO_DIR/target/flink-demo-1.0.SNAPSHOT.jar --port 9000

Running a job directly on YARN without a pre‑started session:

./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar \
  --input hdfs://user/hadoop/input.txt \
  --output hdfs://user/hadoop/output.txt

For detached YARN sessions, add the -d or --detached flag; the client will exit after submission.

Stopping a YARN session must be done with the YARN tool: yarn application -kill <applicationId> 4. Fault Tolerance Settings

Important Flink‑YARN configuration parameters (set in flink-conf.yaml or via -D on session start) include: yarn.reallocate-failed (default true) – whether to re‑allocate failed TaskManager containers. yarn.maximum-failed-containers – max failed containers before the session is considered failed. yarn.application-attempts – number of AM retries.

5. Log Inspection

If a Flink YARN session fails, enable YARN log aggregation (set yarn.log-aggregation-enable=true in yarn-site.xml) and view logs with: yarn logs -applicationId <application ID> The article ends with a note that the full content is organized on GitHub and provides a link to the original source.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkConfigurationYARNCluster DeploymentStandalone
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.