Comprehensive Guide to Flink Deployment, State Programming, Checkpointing, and Performance Tuning
This article provides an extensive overview of Apache Flink, covering deployment modes, cluster sizing, job submission workflows, state programming concepts, checkpoint mechanisms, backpressure handling, comparison with Spark, and practical code snippets for configuration and optimization.
0. Preface
The material is extensive; readers are encouraged to obtain the PDF version by replying with the keyword "Materials" to the public account. Updates will be posted to the group when new content is added.
1. Flink Submission Series
1.1 How to submit Flink?
Local mode : JobManager and TaskManager share a JVM, requiring only a JDK. Suitable for single‑node debugging.
Standalone mode : Flink's built‑in distributed cluster; JobManager acts as Master, TaskManagers as Workers, independent of other resource managers.
YARN mode :
YARN Mode
Lifecycle
Resource Isolation
Advantages
Disadvantages
Main Method
Session
Stops when the session is closed
Shares JM and TM
Pre‑started; resources are fully shared
Weak isolation; TM scaling is difficult
Executed on client
Per‑job
Job stops → cluster stops
Each job gets its own JM and TM
Strong isolation; resources allocated per job
Job startup is slower; each job launches a JM
Executed on client
Application
Cluster stops after the application finishes
Application uses a dedicated JM and TM set
Low client load; isolation between applications, resource sharing inside an application
Optimized deployment of per‑job and session modes
Executed on the cluster
1.2 Flink cluster sizing and project responsibilities
Key factors include record count and size, number of keys and state size per key, state update frequency, network capacity, disk bandwidth, and the number of machines with CPU/memory.
Typical project tasks: real‑time monitoring (user behavior alerts, server attack alerts), real‑time dashboards, data products, data‑driven operations, stream analytics, and real‑time data warehouses.
1.3 Flink job submission workflow and YARN interaction
Upload Flink JAR and configuration to HDFS for shared access.
Client submits the job to ResourceManager, which allocates a container and starts an ApplicationMaster.
ApplicationMaster loads HDFS config, starts JobManager, which builds the execution graph.
JobManager requests resources from ResourceManager; containers are launched for TaskManagers.
TaskManagers send heartbeats; JobManager assigns tasks.
1.4 Flink job submission commands and parameters
./bin/flink run -t yarn-session \
-Dyarn.application.id=application_XXXX_YY \
./examples/streaming/TopSpeedWindowing.jar
./bin/flink run -t yarn-per-job \
--detached ./examples/streaming/TopSpeedWindowing.jar
./bin/flink run-application -t yarn-application \
./examples/streaming/TopSpeedWindowing.jar
# Common options
-flink run -c
# entry class
-Dyarn.application.type
# YARN application type
-yn
# parallelism settings
... (additional flags omitted for brevity)1.5 JobManager overview
The JobManager controls each Flink application, receives the JobGraph, creates an ExecutionGraph, requests slots from TaskManagers, and coordinates checkpoints and cancellations.
Typical deployment uses one primary JobManager with two standby instances managed via ZooKeeper for high availability.
1.6 TaskManager overview
TaskManagers run the actual tasks, each containing a configurable number of slots. Slots define the parallel execution capacity; a TaskManager can have multiple slots as long as they do not exceed CPU cores.
1.7 Slots and parallelism
Slots are the smallest execution unit; parallelism cannot exceed the total number of slots. Example: three TaskManagers each with 3 slots provide 9 slots total; setting parallelism to 1 would use only one slot.
1.8 Operator chains
Flink chains operators into a single task to reduce thread switches, serialization, and buffer copying, thereby lowering latency and increasing throughput.
1.9 Dependency on Hadoop
Flink can run independently of Hadoop but integrates with YARN for resource scheduling and can read/write HDFS for checkpoints.
2. State Programming Series
2.1 Operator state vs. keyed state
Keyed state is partitioned by key and provides isolation per key (e.g., ValueState, ListState, MapState, ReducingState, AggregatingState). Operator state is scoped to the operator instance and is used for sources, sinks, or when no key is defined.
2.2 Choosing between ValueState and ListState for an array of 10 ints
Keyed state: use ValueState[Array[Int]] for update‑style access; ListState[Int] for additive operations. Operator state: prefer ListState .
2.3 Using MapState for grouping by id
MapState is a keyed state; the key must be part of the stream (via keyBy ) before using it.
2.4 Managing Kafka offsets with state
Offsets are stored in checkpointed state, typically as a ListState > . The following code shows snapshot handling:
private transient ListState
> unionOffsetStates;
public void snapshotState(FunctionSnapshotContext context) throws Exception {
if (!running) {
LOG.debug("snapshotState() called on closed source");
} else {
unionOffsetStates.clear();
// fetch current offsets and add to unionOffsetStates
// handle ON_CHECKPOINTS commit mode
}
}2.5 Window processing: first and last element
Use a WindowFunction and retrieve the first and last element from the iterable input.
3. Backpressure and Issues Series
3.1 Monitoring Flink and handling backpressure
Metrics include job uptime, restarts, record rates, CPU, heap usage, GC, network buffers, and checkpoint statistics. Backpressure is detected via the ratio of blocked threads to total threads; mitigation includes scaling resources, optimizing code, and adjusting parallelism.
3.2 Dealing with OOM caused by large state (e.g., PV/UV)
Combine Redis or Bloom filters for deduplication; note Bloom filters have a small false‑positive rate.
3.3 Ensuring GMV correctness after MySQL corrections
Use offline jobs to recompute and correct GMV after source data changes.
4. Spark vs. Flink Comparison
Key differences: architecture (Spark: Master/Worker/Driver/Executor; Flink: JobManager/TaskManager/Slot), scheduling (Spark uses micro‑batches, Flink builds a JobGraph → ExecutionGraph), time semantics (Spark only processing time; Flink supports processing, event, and ingestion time with watermarks), and fault tolerance (Spark provides at‑least‑once, Flink provides exactly‑once via two‑phase commit).
5. Checkpoint Series
5.1 Checkpoint implementation
Flink uses asynchronous barrier snapshots (Chandy‑Lamport algorithm) instead of pausing the job. The CheckpointCoordinator triggers barriers, sources inject them, downstream operators align barriers, and the JobManager collects state handles.
5.2 State backends
MemoryStateBackend : state kept in JVM heap; checkpoints stored in JobManager memory.
FsStateBackend : checkpoints written to a remote filesystem.
RocksDBStateBackend : state serialized to local RocksDB; supports incremental checkpoints.
5.3 Exactly‑once vs. at‑least‑once
Exactly‑once requires barrier alignment; at‑least‑once may replay records after a failure.
5.4 Savepoints
Manual, non‑incremental snapshots used for upgrades, migrations, or A/B testing.
5.5 Restoring from a checkpoint
-s hdfs://192.168.0.1:8020/flink/checkpoint/...5.6 Non‑aligned checkpoints
Barriers are recorded without buffering upstream data, allowing faster checkpointing under backpressure.
6. Window and Watermark Series
6.1 Time semantics
Event time, processing time, and ingestion time are supported.
6.2 Watermarks
Monotonically increasing timestamps that indicate all events with timestamps lower than the watermark have arrived; windows fire when watermark > window end.
6.3 Late data handling
Late elements can be dropped, side‑output, or processed within an allowed lateness interval.
6.4 Window functions
SQL: TUMBLE, HOP, SESSION, CUMULATE, GROUPING SETS, OVER, CUBE. DataStream API: incremental (ReduceFunction, AggregateFunction) and full (ProcessWindowFunction) aggregations.
7. Join Series
7.1 Dual‑stream join mechanisms
Methods include join , coGroup , intervalJoin , and connect with shared state. Example of a keyed co‑group join is provided.
7.2 Left join implementation using CoGroupFunction
Since Flink does not have a native left‑join, a custom CoGroupFunction can emit left records with nulls when no matching right record exists.
7.3 Dimension table lookup
Approaches: async I/O, broadcast state, async I/O with cache, or periodic refresh in the open method.
8. Miscellaneous
8.1 ProcessFunction variants
Includes ProcessFunction , KeyedProcessFunction , CoProcessFunction , ProcessJoinFunction , BroadcastProcessFunction , KeyedBroadcastProcessFunction , ProcessWindowFunction , and ProcessAllWindowFunction .
8.2 Memory management
Flink uses a combination of on‑heap, off‑heap, managed memory, and network buffers. JobManager and TaskManager have separate memory models with configurable heap, off‑heap, and process memory.
8.3 Serialization
Flink provides its own type information system (BasicTypeInfo, TupleTypeInfo, PojoTypeInfo, etc.) and generates efficient TypeSerializer implementations.
8.4 Rich functions vs. plain functions
Rich functions expose lifecycle methods ( open , close ) and a runtime context for accessing state, parallelism, and configuration.
8.5 Partitioning strategies
Global, Shuffle, Rebalance, Forward, Hash, Rescale, Broadcast, Custom
8.6 Distributed cache
Files can be registered with env.registerCachedFile("hdfs:///path", "alias") to avoid repeated downloads.
8.7 Asynchronous I/O
Define an AsyncFunction that issues non‑blocking requests, registers a callback to collect results, and use AsyncDataStream.unorderedWait or orderedWait with timeout and capacity settings.
8.8 Restart strategies
Fixed‑delay, failure‑rate, no‑restart, and fallback strategies are available to control job recovery behavior.
8.9 Data skew mitigation
Adjust source parallelism, add random prefixes (salting) before keyBy , and use two‑phase aggregation (local salting + global desalinization) to balance load.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.