Big Data 4 min read

Flink Task Auto-scaling Design and Implementation

This article presents the design and implementation of Flink task auto‑scaling, covering background, manual and automatic scaling mechanisms, architecture with RescaleCoordinator, persistence via Zookeeper and HDFS, scaling policies for parallelism, CPU and memory, and future plans for fine‑grained and time‑based resource adjustments.

HomeTech

Dec 7, 2021

Flink Task Auto-scaling Design and Implementation

Background: As Flink usage grew internally, the need to reduce costs and improve efficiency led to the development of task scaling capabilities.

Manual scaling: Provides manual adjustment of resources, reducing business impact from minutes to seconds by pre‑allocating containers and performing a Recover instead of a full restart.

Automatic scaling: Supports automatic adjustment of parallelism, TaskManager CPU and memory based on user‑defined policies.

Design steps: (1) Request new Container from ResourceManager via SlotPool, marking it; (2) Stop the job and delete ExecutionGraph; (3) Release old TaskManager, rebuild ExecutionGraph and restore from savepoint on marked TaskManager; (4) Persist the new resource settings to Zookeeper and HDFS for HA recovery.

Architecture: Added RescaleCoordinator in JobManager (HA‑maintained) that periodically checks scaling needs, notifies Dispatcher, which informs JobMaster to request TaskManagers from ResourceManager, release old ones, reschedule, and persist results.

Scaling policies: Adjust parallelism when Kafka latency high and CPU low (IO‑intensive) or idle slots; scale CPU based on utilization; scale memory based on usage and GC.

Future plans: Leverage offline/online workload peaks to improve utilization, and explore fine‑grained scaling of individual TaskManagers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink stream processing Zookeeper Auto Scaling HDFS

Written by

HomeTech

HomeTech tech sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.