Mastering PyFlink on Kubernetes: Practical Deployment Strategies and Lessons
This article explains how to deploy a PyFlink feature platform on Kubernetes, covering basic K8s concepts, Flink execution graphs, various deployment modes, preparation steps, detailed Standalone and Native deployment procedures, and practical tips for efficient big‑data processing.
Background
To solve systematic issues in model and feature iteration and improve algorithm development efficiency, the Big Data Technology and Product team launched a feature platform project. The platform consolidates scattered data storage, eliminates duplicate definitions, simplifies extraction, and shortens pipelines, providing strong sample and feature data support for data scientists, engineers, and ML engineers. It also offers a Zeppelin Notebook with multiple interpreters for easier data analysis and visualization.
Kubernetes Overview
Pod : The atomic scheduling unit in K8s, a group of one or more containers sharing network and storage.
Deployment : A higher‑level abstraction for a set of identical Pods, ensuring high availability.
Service : Defines an access entry point for a set of Pods, selectable via labels; can be exposed externally via LoadBalancer or NodePort, or kept internal as ClusterIP.
ConfigMap : Key‑value configuration data, typically mounted into Pods as configuration files.
Flink Task Execution Graph
Flink jobs progress through several graph transformations:
StreamGraph → JobGraph → ExecutionGraph → Physical Execution Graph. The conversion steps are:
StreamGraph: Starts from Source nodes, each transform creates a StreamNode; StreamNodes are linked by StreamEdges forming a DAG.
JobGraph: Traverses the StreamGraph, merges compatible operators into JobVertices; unmerged operators become separate JobVertices linked by JobEdges, forming a JobVertex DAG.
ExecutionGraph: JobVertices are sorted, ExecutionJobVertices are created, IntermediateDataSets are built into IntermediateResults, establishing dependencies and forming the ExecutionGraph DAG.
Physical Execution: The ExecutionGraph is finally mapped to the physical execution layer.
PyFlink on Kubernetes Deployment Modes
Flink Deployment Modes
Session Mode
Multiple jobs share a single JobManager; the JobGraph is generated on the Flink client.
Per‑Job Mode
A dedicated JobManager is started for each job, which executes the job and then exits.
Application Mode
Similar to Per‑Job, a dedicated JobManager is launched, but the JobGraph is generated on the JobManager itself.
PyFlink on Kubernetes
Standalone : Requires kubectl and YAML files; Flink does not detect the K8s cluster, so resources are allocated passively.
Native : Uses the Flink client scripts (e.g., kubernetes-session.sh or flink run) to actively request resources from K8s.
Deployment Preparation
Kubernetes Cluster
K8s version ≥ 1.9 or Minikube, with a valid KubeConfig, DNS enabled, and a ServiceAccount that has RBAC permissions to create and delete Pods.
PyFlink Docker Image
FROM flink:1.12.1-scala_2.11-java8
# Install Python 3 and pip, plus debugging tools
RUN apt-get update -y && \
apt-get install -y python3.7 python3-pip python3.7-dev && \
rm -rf /var/lib/apt/lists/*
RUN rm -f /usr/bin/python && ln -s /usr/bin/python3 /usr/bin/python
# Install Python Flink
RUN pip3 install apache-flink==1.12.1
# Optional: add third‑party Python or Java dependencies
RUN mkdir -p $FLINK_HOME/usrlib
COPY /path/of/external/jar/dependencies $FLINK_HOME/usrlib/
COPY /path/of/python/codes /opt/python_codes
# Build the final PyFlink imageStandalone Deployment
Session Mode
Create a static session cluster using kubectl to apply ConfigMap, Service, JobManager Deployment, and TaskManager Deployment resources.
Submit jobs to the created session cluster.
Overall workflow:
Step 1: Use kubectl or the K8s dashboard to submit a request to the master.
Step 2: The master creates the Flink Master Deployment, TaskManager Deployment, ConfigMap, and Service.
Step 3: TaskManagers register with the JobManager.
Step 4: Submit the Flink job via flink run to the JobManager address.
Step 5: Dispatcher creates a JobMaster for the new job.
Step 6: JobMaster requests resources; in Standalone mode the resources are already running, so the request returns immediately.
Step 7‑8: JobMaster deploys tasks to the TaskManagers and the job runs.
Commands
kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-session-deployment.yaml
kubectl create -f taskmanager-session-deployment.yaml
kubectl create -f jobmanager-rest-service.yamlWeb UI:
http://your_ip:your_port/ # Submit a Flink streaming job
./bin/flink run -m your_ip:your_port ./examples/streaming/TopSpeedWindowing.jar
# Submit a PyFlink batch job
sudo flink run -m your_ip:your_port -pyfs ./examples/python/table/batch -py word_count kubectl delete -f jobmanager-rest-service.yaml
kubectl delete -f jobmanager-service.yaml
kubectl delete -f flink-configuration-configmap.yaml
kubectl delete -f taskmanager-session-deployment.yaml
kubectl delete -f jobmanager-session-deployment.yamlApplication Mode (Standalone)
Job submission flow: the Standalone JobCluster EntryPoint finds the user JAR, generates a JobGraph, submits it to the Dispatcher, which creates a JobMaster, requests resources, and finally runs the job.
kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-rest-service.yaml
kubectl create -f jobmanager-application.yaml
kubectl create -f taskmanager-job-deployment.yamlWeb UI:
http://your_ip:your_port/Native Deployment
Session Mode
./bin/kubernetes-session.sh \
-Dkubernetes.cluster-id=session-cluster-1 \
-Dtaskmanager.numberOfTaskSlots=1 \
-Dresourcemanager.taskmanager-timeout=3600000 \
-Dkubernetes.rest-service.exposed.type=NodePort \
-Dkubernetes.container.image=demo-pyflink-app:1.12.1 ./bin/flink run \
--target kubernetes-session \
-Dkubernetes.cluster-id=session-cluster-1 \
-pyfs ./examples/python/table/batch \
-pym new_word_count echo 'stop' | ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=session-cluster-1 -Dexecution.attached=true
# or
kubectl delete deployment/session-cluster-1Application Mode (Native)
./bin/flink run-application -p 2 -t kubernetes-application \
-Dkubernetes.cluster-id=app-cluster \
-Dtaskmanager.memory.process.size=4096m \
-Dkubernetes.taskmanager.cpu=2 \
-Dtaskmanager.numberOfTaskSlots=4 \
-Dkubernetes.container.image=demo-pyflink-app:1.12.1 \
-pyfs /opt/python_codes \
-pym new_word_countConclusion
This article shares practical experience deploying a feature platform with PyFlink on Kubernetes, briefly introduces basic K8s concepts and Flink execution graphs, compares different Flink deployment modes, and provides concrete demos that illustrate component coordination during deployment, helping readers both get started and understand the underlying execution process.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
