Big Data 16 min read

Mastering PyFlink on Kubernetes: Practical Deployment Strategies and Lessons

This article explains how to deploy a PyFlink feature platform on Kubernetes, covering basic K8s concepts, Flink execution graphs, various deployment modes, preparation steps, detailed Standalone and Native deployment procedures, and practical tips for efficient big‑data processing.

Huolala Tech
Huolala Tech
Huolala Tech
Mastering PyFlink on Kubernetes: Practical Deployment Strategies and Lessons

Background

To solve systematic issues in model and feature iteration and improve algorithm development efficiency, the Big Data Technology and Product team launched a feature platform project. The platform consolidates scattered data storage, eliminates duplicate definitions, simplifies extraction, and shortens pipelines, providing strong sample and feature data support for data scientists, engineers, and ML engineers. It also offers a Zeppelin Notebook with multiple interpreters for easier data analysis and visualization.

Kubernetes Overview

Pod : The atomic scheduling unit in K8s, a group of one or more containers sharing network and storage.

Deployment : A higher‑level abstraction for a set of identical Pods, ensuring high availability.

Service : Defines an access entry point for a set of Pods, selectable via labels; can be exposed externally via LoadBalancer or NodePort, or kept internal as ClusterIP.

ConfigMap : Key‑value configuration data, typically mounted into Pods as configuration files.

Flink Task Execution Graph

Flink jobs progress through several graph transformations:

StreamGraph → JobGraph → ExecutionGraph → Physical Execution Graph

. The conversion steps are:

StreamGraph: Starts from Source nodes, each transform creates a StreamNode; StreamNodes are linked by StreamEdges forming a DAG.

JobGraph: Traverses the StreamGraph, merges compatible operators into JobVertices; unmerged operators become separate JobVertices linked by JobEdges, forming a JobVertex DAG.

ExecutionGraph: JobVertices are sorted, ExecutionJobVertices are created, IntermediateDataSets are built into IntermediateResults, establishing dependencies and forming the ExecutionGraph DAG.

Physical Execution: The ExecutionGraph is finally mapped to the physical execution layer.

PyFlink on Kubernetes Deployment Modes

Flink Deployment Modes

Session Mode

Multiple jobs share a single JobManager; the JobGraph is generated on the Flink client.

Per‑Job Mode

A dedicated JobManager is started for each job, which executes the job and then exits.

Application Mode

Similar to Per‑Job, a dedicated JobManager is launched, but the JobGraph is generated on the JobManager itself.

PyFlink on Kubernetes

Standalone : Requires kubectl and YAML files; Flink does not detect the K8s cluster, so resources are allocated passively.

Native : Uses the Flink client scripts (e.g., kubernetes-session.sh or flink run) to actively request resources from K8s.

Deployment Preparation

Kubernetes Cluster

K8s version ≥ 1.9 or Minikube, with a valid KubeConfig, DNS enabled, and a ServiceAccount that has RBAC permissions to create and delete Pods.

PyFlink Docker Image

FROM flink:1.12.1-scala_2.11-java8
# Install Python 3 and pip, plus debugging tools
RUN apt-get update -y && \
    apt-get install -y python3.7 python3-pip python3.7-dev && \
    rm -rf /var/lib/apt/lists/*
RUN rm -f /usr/bin/python && ln -s /usr/bin/python3 /usr/bin/python
# Install Python Flink
RUN pip3 install apache-flink==1.12.1
# Optional: add third‑party Python or Java dependencies
RUN mkdir -p $FLINK_HOME/usrlib
COPY /path/of/external/jar/dependencies $FLINK_HOME/usrlib/
COPY /path/of/python/codes /opt/python_codes
# Build the final PyFlink image

Standalone Deployment

Session Mode

Create a static session cluster using kubectl to apply ConfigMap, Service, JobManager Deployment, and TaskManager Deployment resources.

Submit jobs to the created session cluster.

Overall workflow:

Step 1: Use kubectl or the K8s dashboard to submit a request to the master.

Step 2: The master creates the Flink Master Deployment, TaskManager Deployment, ConfigMap, and Service.

Step 3: TaskManagers register with the JobManager.

Step 4: Submit the Flink job via flink run to the JobManager address.

Step 5: Dispatcher creates a JobMaster for the new job.

Step 6: JobMaster requests resources; in Standalone mode the resources are already running, so the request returns immediately.

Step 7‑8: JobMaster deploys tasks to the TaskManagers and the job runs.

Commands

kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-session-deployment.yaml
kubectl create -f taskmanager-session-deployment.yaml
kubectl create -f jobmanager-rest-service.yaml

Web UI:

http://your_ip:your_port/
# Submit a Flink streaming job
./bin/flink run -m your_ip:your_port ./examples/streaming/TopSpeedWindowing.jar
# Submit a PyFlink batch job
sudo flink run -m your_ip:your_port -pyfs ./examples/python/table/batch -py word_count
kubectl delete -f jobmanager-rest-service.yaml
kubectl delete -f jobmanager-service.yaml
kubectl delete -f flink-configuration-configmap.yaml
kubectl delete -f taskmanager-session-deployment.yaml
kubectl delete -f jobmanager-session-deployment.yaml

Application Mode (Standalone)

Job submission flow: the Standalone JobCluster EntryPoint finds the user JAR, generates a JobGraph, submits it to the Dispatcher, which creates a JobMaster, requests resources, and finally runs the job.

kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-rest-service.yaml
kubectl create -f jobmanager-application.yaml
kubectl create -f taskmanager-job-deployment.yaml

Web UI:

http://your_ip:your_port/

Native Deployment

Session Mode

./bin/kubernetes-session.sh \
  -Dkubernetes.cluster-id=session-cluster-1 \
  -Dtaskmanager.numberOfTaskSlots=1 \
  -Dresourcemanager.taskmanager-timeout=3600000 \
  -Dkubernetes.rest-service.exposed.type=NodePort \
  -Dkubernetes.container.image=demo-pyflink-app:1.12.1
./bin/flink run \
  --target kubernetes-session \
  -Dkubernetes.cluster-id=session-cluster-1 \
  -pyfs ./examples/python/table/batch \
  -pym new_word_count
echo 'stop' | ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=session-cluster-1 -Dexecution.attached=true
# or
kubectl delete deployment/session-cluster-1

Application Mode (Native)

./bin/flink run-application -p 2 -t kubernetes-application \
  -Dkubernetes.cluster-id=app-cluster \
  -Dtaskmanager.memory.process.size=4096m \
  -Dkubernetes.taskmanager.cpu=2 \
  -Dtaskmanager.numberOfTaskSlots=4 \
  -Dkubernetes.container.image=demo-pyflink-app:1.12.1 \
  -pyfs /opt/python_codes \
  -pym new_word_count

Conclusion

This article shares practical experience deploying a feature platform with PyFlink on Kubernetes, briefly introduces basic K8s concepts and Flink execution graphs, compares different Flink deployment modes, and provides concrete demos that illustrate component coordination during deployment, helping readers both get started and understand the underlying execution process.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeFlinkDeploymentKubernetesPyFlink
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.