Optimizing Flink Task Scheduling on a Kubernetes Standalone Cluster for Balanced Resource Utilization
This article analyzes the uneven task distribution problem in a Flink job running on a Kubernetes standalone cluster with 35 TaskManagers and 140 slots, proposes slot‑sharing‑group prioritization and delayed scheduling strategies, and demonstrates how these optimizations achieve more balanced CPU load and reduced data backlog.
Background : The Flink job is deployed on a Kubernetes standalone cluster where the Flink cluster is first launched in containers and then jobs are submitted. Task submission and TaskManager registration happen concurrently.
Problem : With 35 TaskManagers providing 140 slots, a vertex whose parallelism is less than 140 leads to uneven task placement. For example, one Vertex’s tasks are concentrated on a few TaskManagers, causing load imbalance. The issue persists even when cluster.evenly-spread-out-slots=true is set.
Observed Topology : The job contains five vertices; two have parallelism 140, the others have parallelism 10, 30, and 35 respectively. The maximum parallelism is 140, and the cluster is configured with 35 TaskManagers each offering 4 cores and 8 GB.
Optimization Analysis : The problem can be simplified to a topology such as Vertex A(p=2) → Vertex B(p=4) → Vertex C(p=2). Using slot sharing and local data transfer preferences, the topology is divided into four ExecutionSlotSharingGroups: {A1,B1,C1}, {A2,B2,C2}, {B3}, {B4}. If each TaskManager is split into two slots, the allocation may become unbalanced, causing a bottleneck on the TaskManager that hosts the heavier tasks.
Proposed Optimizations :
When requesting slots for an ExecutionSlotSharingGroup, sort groups by the number of contained tasks and schedule groups with more tasks first.
Delay task scheduling until enough TaskManagers are registered so that the groups can be evenly distributed before slot acquisition.
Implementation snippets:
1. 为ExecutionSlotSharingGroup申请slot时先对其按包含Task个数排序,优先调度Task个数多的分组 2. 延缓任务调度,等注册TaskManager个数足够大ExecutionSlotSharingGroup平均分配再为其申请SlotEffect : After applying the optimizations, tasks belonging to the same vertex are evenly scheduled across different TaskManagers.
优化后task调度情况:同个vertex的多个task均匀调度到不同的taskmanager节点上Performance Comparison :
CPU Load – Before optimization: some nodes stay at 100 % for long periods; After optimization: CPU load is more evenly distributed and no node remains at sustained 100 %.
Data Backlog – The backlog after optimization is roughly half of the original, leading to higher throughput and lower latency.
Further Considerations :
Task Balancing – For a topology like Vertex A(p=3) → Vertex B(p=4) → Vertex C(p=1), an initial balanced grouping such as {A1,B1}, {A3,B3}, {A2,B2}, {B4,C1} can mitigate cross‑node communication overhead.
Delayed Scheduling Improvement – Incorporate delay strategies during Flink’s execution plan generation to reduce perceived latency for users.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
