Big Data 12 min read

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

This tutorial explains how to install and configure Apache Flink in three deployment modes—Standalone, Hadoop YARN, and Kubernetes—covering node preparation, configuration files, package distribution, job submission, and monitoring through the Flink Web UI, with full command‑line examples and code snippets.

Open Source Linux

Mar 11, 2024

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

1. Standalone Cluster Deployment

1.1 Node Allocation

Flink’s runtime consists of one JobManager (master) and multiple TaskManagers (workers). In this example four CentOS 7.6 nodes are used: three as cluster nodes (one JobManager + TaskManager, two TaskManagers) and one client node for submitting jobs. All nodes must have Java 8+ installed and password‑less SSH between them.

1.2 Deploying the Standalone Cluster

Download the latest Flink package (e.g., Flink 1.16.0) from the official site and extract it on the master node (node1).

[root@node1 software]# tar -zxvf ./flink-1.16.0-bin-scala_2.12.tgz

Configure the master node by editing $FLINK_HOME/conf/masters:

#vim $FLINK_HOME/conf/masters
node1:8081

Configure the worker nodes by editing $FLINK_HOME/conf/workers:

#vim $FLINK_HOME/conf/workers
node1
node2
node3

Edit $FLINK_HOME/conf/flink-conf.yaml (key parts shown):

# JobManager address
jobmanager.rpc.address: node1
# Bind to all interfaces
jobmanager.bind-host: 0.0.0.0
taskmanager.bind-host: 0.0.0.0
taskmanager.host: node1
taskmanager.numberOfTaskSlots: 3
rest.address: node1
rest.bind-address: 0.0.0.0

Distribute the Flink directory and the edited flink-conf.yaml to the worker nodes (node2, node3) using scp and adjust taskmanager.host accordingly.

Start the cluster on the master node: [root@node1 bin]# ./start-cluster.sh Access the Flink Web UI at http://node1:8081 to verify the cluster is running.

2. Flink on YARN

Flink can run on YARN, which provides dynamic resource allocation. The client must have Hadoop 2.8.5+ installed and the HADOOP_CLASSPATH environment variable set.

# vim /etc/profile
export HADOOP_CLASSPATH=`hadoop classpath`
source /etc/profile

Upload and extract the Flink package on the YARN client node (node5).

[root@node5 software]# tar -zxvf ./flink-1.16.0-bin-scala_2.12.tgz

Submit a Flink job in YARN Application mode (using the same SocketWordCount example):

[root@node5 bin]# ./flink run-application -t yarn-application -c com.mashibing.flinkjava.code.chapter3.SocketWordCount /root/FlinkJavaCode-1.0-SNAPSHOT-jar-with-dependencies.jar

Monitor the job via the YARN ResourceManager UI ( http://node1:8088) and the Flink Web UI linked from the ApplicationMaster.

3. Sample Flink Job (Socket WordCount)

The following Java program reads text from a socket, splits it into words, counts occurrences, and prints the result.

/**
 * Real‑time WordCount from socket data
 */
public class SocketWordCount {
    public static void main(String[] args) throws Exception {
        // 1. Set up execution environment
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        // 2. Read from socket
        DataStreamSource<String> ds = env.socketTextStream("node5", 9999);
        // 3. Transform to (word,1) tuples
        SingleOutputStreamOperator<Tuple2<String, Integer>> tupleDS = ds.flatMap((String line, Collector<Tuple2<String, Integer>> out) -> {
            String[] words = line.split(",");
            for (String word : words) {
                out.collect(Tuple2.of(word, 1));
            }
        }).returns(Types.TUPLE(Types.STRING, Types.INT));
        // 4. Aggregate and print
        tupleDS.keyBy(tp -> tp.f0).sum(1).print();
        // 5. Execute the job
        env.execute();
    }
}

Package the job with Maven Assembly plugin (producing a -jar-with-dependencies.jar) and submit it as shown above.

4. Running the Job

Start a netcat server on the data source node (node5):

[root@node5 ~]# yum -y install nc
[root@node5 ~]# nc -lk 9999

Send test data (e.g., hello,a, hello,b, …) to the socket; the Flink job will output word counts in the TaskManager logs and can be viewed in the Flink Web UI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Kubernetes YARN Standalone

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.