Big Data 9 min read

Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment

This article explains Flink’s architecture, detailing the roles of Client, JobManager and TaskManager, walks through a SocketTextStreamWordCount example, and clarifies the four‑layer graph model—StreamGraph, JobGraph, ExecutionGraph, and the physical execution graph—highlighting why each layer exists.

21CTO
21CTO
21CTO
Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment

Architecture

To understand a system, we start with its architecture. After a Flink cluster starts, a JobManager and one or more TaskManager processes are launched. A Client submits jobs to the JobManager, which schedules tasks to the TaskManagers. TaskManagers report heartbeats and statistics back to the JobManager, and they exchange data streams among themselves. All three components run as independent JVM processes.

Client : Submits a job from any machine that can reach the JobManager. After submission it may exit (for streaming jobs) or wait for results.

JobManager : Schedules jobs, coordinates checkpoints, and generates an optimized execution plan, similar to Storm’s Nimbus.

TaskManager : Holds a configurable number of slots; each slot runs a task (thread). It receives tasks from the JobManager, establishes Netty connections upstream, and processes data.

The multi‑thread model allows high CPU utilization but lacks resource isolation, making debugging harder compared with a per‑job JVM model.

Job Example

We use Flink’s built‑in SocketTextStreamWordCount example, which counts word occurrences from a socket stream.

Start a local server with netcat.

Submit the Flink program.

When words are entered into the netcat terminal, the TaskManager output shows the word counts.

By replacing the final line env.execute(); with System.out.println(env.getExecutionPlan()); and running the job with parallelism 2, Flink prints a JSON representation of the logical execution plan, which can be visualized at flink.apache.org/visualizer .

Graph

Flink’s execution graphs consist of four layers:

StreamGraph : Generated directly from the user’s Stream API code; represents the program’s topology.

JobGraph : Optimized version of the StreamGraph, submitted to the JobManager. Optimizations include chaining multiple operators into a single JobVertex .

ExecutionGraph : Distributed execution graph created by the JobManager from the JobGraph; core data structure for scheduling.

Physical Execution Graph : The actual deployment of tasks on TaskManagers; not a concrete data structure but the runtime view of the job.

Key sub‑components include:

StreamNode / StreamEdge : Nodes and edges in the StreamGraph.

JobVertex, IntermediateDataSet, JobEdge : Elements of the JobGraph.

ExecutionJobVertex, ExecutionVertex, IntermediateResult, ExecutionEdge : Elements of the ExecutionGraph.

Task, ResultPartition, ResultSubpartition, InputGate, InputChannel : Elements of the physical execution.

This layered design decouples different stages of job processing, making optimization, scheduling, and monitoring easier—similar to Spark’s separation of logical and physical DAGs.

Overall, the four‑graph architecture provides clear responsibilities at each stage, facilitating both batch and stream processing within Flink.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkstream processingTaskManagerJobManagerExecution Graph
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.