Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes
This tutorial explains how to install and configure Apache Flink in three deployment modes—Standalone, Hadoop YARN, and Kubernetes—covering node preparation, configuration files, package distribution, job submission, and monitoring through the Flink Web UI, with full command‑line examples and code snippets.
1. Standalone Cluster Deployment
1.1 Node Allocation
Flink’s runtime consists of one JobManager (master) and multiple TaskManagers (workers). In this example four CentOS 7.6 nodes are used: three as cluster nodes (one JobManager + TaskManager, two TaskManagers) and one client node for submitting jobs. All nodes must have Java 8+ installed and password‑less SSH between them.
1.2 Deploying the Standalone Cluster
Download the latest Flink package (e.g., Flink 1.16.0) from the official site and extract it on the master node (node1).
[root@node1 software]# tar -zxvf ./flink-1.16.0-bin-scala_2.12.tgzConfigure the master node by editing $FLINK_HOME/conf/masters:
#vim $FLINK_HOME/conf/masters
node1:8081Configure the worker nodes by editing $FLINK_HOME/conf/workers:
#vim $FLINK_HOME/conf/workers
node1
node2
node3Edit $FLINK_HOME/conf/flink-conf.yaml (key parts shown):
# JobManager address
jobmanager.rpc.address: node1
# Bind to all interfaces
jobmanager.bind-host: 0.0.0.0
taskmanager.bind-host: 0.0.0.0
taskmanager.host: node1
taskmanager.numberOfTaskSlots: 3
rest.address: node1
rest.bind-address: 0.0.0.0Distribute the Flink directory and the edited flink-conf.yaml to the worker nodes (node2, node3) using scp and adjust taskmanager.host accordingly.
Start the cluster on the master node: [root@node1 bin]# ./start-cluster.sh Access the Flink Web UI at http://node1:8081 to verify the cluster is running.
2. Flink on YARN
Flink can run on YARN, which provides dynamic resource allocation. The client must have Hadoop 2.8.5+ installed and the HADOOP_CLASSPATH environment variable set.
# vim /etc/profile
export HADOOP_CLASSPATH=`hadoop classpath`
source /etc/profileUpload and extract the Flink package on the YARN client node (node5).
[root@node5 software]# tar -zxvf ./flink-1.16.0-bin-scala_2.12.tgzSubmit a Flink job in YARN Application mode (using the same SocketWordCount example):
[root@node5 bin]# ./flink run-application -t yarn-application -c com.mashibing.flinkjava.code.chapter3.SocketWordCount /root/FlinkJavaCode-1.0-SNAPSHOT-jar-with-dependencies.jarMonitor the job via the YARN ResourceManager UI ( http://node1:8088) and the Flink Web UI linked from the ApplicationMaster.
3. Sample Flink Job (Socket WordCount)
The following Java program reads text from a socket, splits it into words, counts occurrences, and prints the result.
/**
* Real‑time WordCount from socket data
*/
public class SocketWordCount {
public static void main(String[] args) throws Exception {
// 1. Set up execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 2. Read from socket
DataStreamSource<String> ds = env.socketTextStream("node5", 9999);
// 3. Transform to (word,1) tuples
SingleOutputStreamOperator<Tuple2<String, Integer>> tupleDS = ds.flatMap((String line, Collector<Tuple2<String, Integer>> out) -> {
String[] words = line.split(",");
for (String word : words) {
out.collect(Tuple2.of(word, 1));
}
}).returns(Types.TUPLE(Types.STRING, Types.INT));
// 4. Aggregate and print
tupleDS.keyBy(tp -> tp.f0).sum(1).print();
// 5. Execute the job
env.execute();
}
}Package the job with Maven Assembly plugin (producing a -jar-with-dependencies.jar) and submit it as shown above.
4. Running the Job
Start a netcat server on the data source node (node5):
[root@node5 ~]# yum -y install nc
[root@node5 ~]# nc -lk 9999Send test data (e.g., hello,a, hello,b, …) to the socket; the Flink job will output word counts in the TaskManager logs and can be viewed in the Flink Web UI.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
