Big Data 9 min read

Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment

This article explains Flink’s architecture, detailing the roles of Client, JobManager and TaskManager, walks through a SocketTextStreamWordCount example, and clarifies the four‑layer graph model—StreamGraph, JobGraph, ExecutionGraph, and the physical execution graph—highlighting why each layer exists.

21CTO

Aug 14, 2017

Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment

Architecture

To understand a system, we start with its architecture. After a Flink cluster starts, a JobManager and one or more TaskManager processes are launched. A Client submits jobs to the JobManager, which schedules tasks to the TaskManagers. TaskManagers report heartbeats and statistics back to the JobManager, and they exchange data streams among themselves. All three components run as independent JVM processes.

Client : Submits a job from any machine that can reach the JobManager. After submission it may exit (for streaming jobs) or wait for results.

JobManager : Schedules jobs, coordinates checkpoints, and generates an optimized execution plan, similar to Storm’s Nimbus.

TaskManager : Holds a configurable number of slots; each slot runs a task (thread). It receives tasks from the JobManager, establishes Netty connections upstream, and processes data.

The multi‑thread model allows high CPU utilization but lacks resource isolation, making debugging harder compared with a per‑job JVM model.

Job Example

We use Flink’s built‑in SocketTextStreamWordCount example, which counts word occurrences from a socket stream.

Start a local server with netcat.

Submit the Flink program.

When words are entered into the netcat terminal, the TaskManager output shows the word counts.

By replacing the final line env.execute(); with System.out.println(env.getExecutionPlan()); and running the job with parallelism 2, Flink prints a JSON representation of the logical execution plan, which can be visualized at flink.apache.org/visualizer .

Graph

Flink’s execution graphs consist of four layers:

StreamGraph : Generated directly from the user’s Stream API code; represents the program’s topology.

JobGraph : Optimized version of the StreamGraph, submitted to the JobManager. Optimizations include chaining multiple operators into a single JobVertex .

ExecutionGraph : Distributed execution graph created by the JobManager from the JobGraph; core data structure for scheduling.

Physical Execution Graph : The actual deployment of tasks on TaskManagers; not a concrete data structure but the runtime view of the job.

Key sub‑components include:

StreamNode / StreamEdge : Nodes and edges in the StreamGraph.

JobVertex, IntermediateDataSet, JobEdge : Elements of the JobGraph.

ExecutionJobVertex, ExecutionVertex, IntermediateResult, ExecutionEdge : Elements of the ExecutionGraph.

Task, ResultPartition, ResultSubpartition, InputGate, InputChannel : Elements of the physical execution.

This layered design decouples different stages of job processing, making optimization, scheduling, and monitoring easier—similar to Spark’s separation of logical and physical DAGs.

Overall, the four‑graph architecture provides clear responsibilities at each stage, facilitating both batch and stream processing within Flink.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink stream processing TaskManager JobManager Execution Graph

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.