Deploying Apache Flink 1.12 on Kubernetes: High‑Availability Architecture and DataStream Batch Execution
This article explains how Flink 1.12 introduces production‑grade Kubernetes high‑availability, details the underlying architecture and deployment steps, and shows how the DataStream API can run in batch mode using runtime‑mode configuration and example commands.
Since its early releases Flink lacked some critical features, but starting with version 1.12 it provides a production‑grade high‑availability solution on Kubernetes, allowing JobManager failover without ZooKeeper and enabling DataStream to run in batch mode.
High‑Availability on Kubernetes
Flink 1.12 integrates with Kubernetes (K8s) to manage resources, using ConfigMap objects to store metadata needed for JobManager recovery. The client submits a resource description file (configmap, services, deployments) to the K8s API server, which creates the necessary pods.
The workflow includes:
Flink client connects to the K8s API server and submits the cluster description.
K8s creates JobManager and TaskManager pods based on the deployments.
The user submits a job via the Flink client; the job graph is uploaded through the REST client.
JobMaster requests slots from the KubernetesResourceManager, which allocates TaskManager pods.
TaskManagers register with the SlotManager and receive slots to execute the job.
Key advantages of K8s over YARN include native container isolation, built‑in monitoring (e.g., Prometheus), and better multi‑tenant support, though K8s may incur higher operational complexity.
1. K8s as the de‑facto container management standard offers resource and network isolation, security, and multi‑tenant advantages<br/>2. Seamless integration with cloud‑native monitoring systems like Prometheus<br/>3. YARN lacks load‑balance and mixed‑deployment features, resulting in lower resource utilizationDeployment examples (native Kubernetes deployments) use ConfigMaps for configuration files such as flink-conf.yaml and define services and deployments for JobManager and TaskManager.
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory<br/>high-availability.storageDir: s3:///flink/recovery<br/>kubernetes.cluster-id: cluster1337 # Create ConfigMap and services
$ kubectl create -f flink-configuration-configmap.yaml
$ kubectl create -f jobmanager-service.yaml
# Deploy JobManager and TaskManager
$ kubectl create -f jobmanager-session-deployment.yaml
$ kubectl create -f taskmanager-session-deployment.yamlSample JobManager deployment (YAML) and TaskManager deployment are provided, showing container images, ports, liveness probes, and volume mounts for configuration.
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-jobmanager
spec:
replicas: 1
selector:
matchLabels:
app: flink
component: jobmanager
template:
metadata:
labels:
app: flink
component: jobmanager
spec:
containers:
- name: jobmanager
image: flink:1.12.0-scala_2.11
args: ["jobmanager"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob-server
- containerPort: 8081
name: webui
livenessProbe:
tcpSocket:
port: 6123
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf
securityContext:
runAsUser: 9999
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.propertiesDataStream API Batch Execution
Flink 1.12 adds a runtime mode configuration ( execution.runtime-mode) that lets the DataStream API run in BATCH, STREAMING, or AUTOMATIC mode, unifying batch and stream processing.
$ bin/flink run -Dexecution.runtime-mode=BATCH examples/streaming/WordCount.jarIn code the mode can be set as:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);The article also outlines differences between streaming and batch execution, such as trigger behavior, state backend usage, watermark relevance, failure handling, and supported operators.
Overall, the guide demonstrates how to configure, deploy, and manage a Flink 1.12 session cluster on Kubernetes with high availability and how to leverage the new batch capabilities of the DataStream API.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
