Big Data 48 min read

Comprehensive Guide to Flink Deployment, State Programming, Checkpointing, and Performance Tuning

This article provides an extensive overview of Apache Flink, covering deployment modes, cluster sizing, job submission workflows, state programming concepts, checkpoint mechanisms, backpressure handling, comparison with Spark, and practical code snippets for configuration and optimization.

DataFunTalk

Apr 25, 2022

Comprehensive Guide to Flink Deployment, State Programming, Checkpointing, and Performance Tuning

0. Preface

The material is extensive; readers are encouraged to obtain the PDF version by replying with the keyword "Materials" to the public account. Updates will be posted to the group when new content is added.

1. Flink Submission Series

1.1 How to submit Flink?

Local mode : JobManager and TaskManager share a JVM, requiring only a JDK. Suitable for single‑node debugging.

Standalone mode : Flink's built‑in distributed cluster; JobManager acts as Master, TaskManagers as Workers, independent of other resource managers.

YARN mode :

YARN Mode

Lifecycle

Resource Isolation

Advantages

Disadvantages

Main Method

Session

Stops when the session is closed

Shares JM and TM

Pre‑started; resources are fully shared

Weak isolation; TM scaling is difficult

Executed on client

Per‑job

Job stops → cluster stops

Each job gets its own JM and TM

Strong isolation; resources allocated per job

Job startup is slower; each job launches a JM

Executed on client

Application

Cluster stops after the application finishes

Application uses a dedicated JM and TM set

Low client load; isolation between applications, resource sharing inside an application

Optimized deployment of per‑job and session modes

Executed on the cluster

1.2 Flink cluster sizing and project responsibilities

Key factors include record count and size, number of keys and state size per key, state update frequency, network capacity, disk bandwidth, and the number of machines with CPU/memory.

Typical project tasks: real‑time monitoring (user behavior alerts, server attack alerts), real‑time dashboards, data products, data‑driven operations, stream analytics, and real‑time data warehouses.

1.3 Flink job submission workflow and YARN interaction

Upload Flink JAR and configuration to HDFS for shared access.

Client submits the job to ResourceManager, which allocates a container and starts an ApplicationMaster.

ApplicationMaster loads HDFS config, starts JobManager, which builds the execution graph.

JobManager requests resources from ResourceManager; containers are launched for TaskManagers.

TaskManagers send heartbeats; JobManager assigns tasks.

1.4 Flink job submission commands and parameters

./bin/flink run -t yarn-session \
  -Dyarn.application.id=application_XXXX_YY \
  ./examples/streaming/TopSpeedWindowing.jar

./bin/flink run -t yarn-per-job \
  --detached ./examples/streaming/TopSpeedWindowing.jar

./bin/flink run-application -t yarn-application \
  ./examples/streaming/TopSpeedWindowing.jar

# Common options
-flink run -c <class>            # entry class
-Dyarn.application.type <type>   # YARN application type
-yn <parallelism>                # parallelism settings
... (additional flags omitted for brevity)

1.5 JobManager overview

The JobManager controls each Flink application, receives the JobGraph, creates an ExecutionGraph, requests slots from TaskManagers, and coordinates checkpoints and cancellations.

Typical deployment uses one primary JobManager with two standby instances managed via ZooKeeper for high availability.

1.6 TaskManager overview

TaskManagers run the actual tasks, each containing a configurable number of slots. Slots define the parallel execution capacity; a TaskManager can have multiple slots as long as they do not exceed CPU cores.

1.7 Slots and parallelism

Slots are the smallest execution unit; parallelism cannot exceed the total number of slots. Example: three TaskManagers each with 3 slots provide 9 slots total; setting parallelism to 1 would use only one slot.

1.8 Operator chains

Flink chains operators into a single task to reduce thread switches, serialization, and buffer copying, thereby lowering latency and increasing throughput.

1.9 Dependency on Hadoop

Flink can run independently of Hadoop but integrates with YARN for resource scheduling and can read/write HDFS for checkpoints.

2. State Programming Series

2.1 Operator state vs. keyed state

Keyed state is partitioned by key and provides isolation per key (e.g., ValueState, ListState, MapState, ReducingState, AggregatingState). Operator state is scoped to the operator instance and is used for sources, sinks, or when no key is defined.

2.2 Choosing between ValueState and ListState for an array of 10 ints

Keyed state: use ValueState[Array[Int]] for update‑style access; ListState[Int] for additive operations. Operator state: prefer ListState.

2.3 Using MapState for grouping by id

MapState is a keyed state; the key must be part of the stream (via keyBy) before using it.

2.4 Managing Kafka offsets with state

Offsets are stored in checkpointed state, typically as a ListState<Tuple2<KafkaTopicPartition, Long>>. The following code shows snapshot handling:

private transient ListState<Tuple2<KafkaTopicPartition, Long>> unionOffsetStates;

public void snapshotState(FunctionSnapshotContext context) throws Exception {
    if (!running) {
        LOG.debug("snapshotState() called on closed source");
    } else {
        unionOffsetStates.clear();
        // fetch current offsets and add to unionOffsetStates
        // handle ON_CHECKPOINTS commit mode
    }
}

2.5 Window processing: first and last element

Use a WindowFunction and retrieve the first and last element from the iterable input.

3. Backpressure and Issues Series

3.1 Monitoring Flink and handling backpressure

Metrics include job uptime, restarts, record rates, CPU, heap usage, GC, network buffers, and checkpoint statistics. Backpressure is detected via the ratio of blocked threads to total threads; mitigation includes scaling resources, optimizing code, and adjusting parallelism.

3.2 Dealing with OOM caused by large state (e.g., PV/UV)

Combine Redis or Bloom filters for deduplication; note Bloom filters have a small false‑positive rate.

3.3 Ensuring GMV correctness after MySQL corrections

Use offline jobs to recompute and correct GMV after source data changes.

4. Spark vs. Flink Comparison

Key differences: architecture (Spark: Master/Worker/Driver/Executor; Flink: JobManager/TaskManager/Slot), scheduling (Spark uses micro‑batches, Flink builds a JobGraph → ExecutionGraph), time semantics (Spark only processing time; Flink supports processing, event, and ingestion time with watermarks), and fault tolerance (Spark provides at‑least‑once, Flink provides exactly‑once via two‑phase commit).

5. Checkpoint Series

5.1 Checkpoint implementation

Flink uses asynchronous barrier snapshots (Chandy‑Lamport algorithm) instead of pausing the job. The CheckpointCoordinator triggers barriers, sources inject them, downstream operators align barriers, and the JobManager collects state handles.

5.2 State backends

MemoryStateBackend : state kept in JVM heap; checkpoints stored in JobManager memory.

FsStateBackend : checkpoints written to a remote filesystem.

RocksDBStateBackend : state serialized to local RocksDB; supports incremental checkpoints.

5.3 Exactly‑once vs. at‑least‑once

Exactly‑once requires barrier alignment; at‑least‑once may replay records after a failure.

5.4 Savepoints

Manual, non‑incremental snapshots used for upgrades, migrations, or A/B testing.

5.5 Restoring from a checkpoint

-s hdfs://192.168.0.1:8020/flink/checkpoint/...

5.6 Non‑aligned checkpoints

Barriers are recorded without buffering upstream data, allowing faster checkpointing under backpressure.

6. Window and Watermark Series

6.1 Time semantics

Event time, processing time, and ingestion time are supported.

6.2 Watermarks

Monotonically increasing timestamps that indicate all events with timestamps lower than the watermark have arrived; windows fire when watermark > window end.

6.3 Late data handling

Late elements can be dropped, side‑output, or processed within an allowed lateness interval.

6.4 Window functions

SQL: TUMBLE, HOP, SESSION, CUMULATE, GROUPING SETS, OVER, CUBE. DataStream API: incremental (ReduceFunction, AggregateFunction) and full (ProcessWindowFunction) aggregations.

7. Join Series

7.1 Dual‑stream join mechanisms

Methods include join, coGroup, intervalJoin, and connect with shared state. Example of a keyed co‑group join is provided.

7.2 Left join implementation using CoGroupFunction

Since Flink does not have a native left‑join, a custom CoGroupFunction can emit left records with nulls when no matching right record exists.

7.3 Dimension table lookup

Approaches: async I/O, broadcast state, async I/O with cache, or periodic refresh in the open method.

8. Miscellaneous

8.1 ProcessFunction variants

Includes ProcessFunction, KeyedProcessFunction, CoProcessFunction, ProcessJoinFunction, BroadcastProcessFunction, KeyedBroadcastProcessFunction, ProcessWindowFunction, and ProcessAllWindowFunction.

8.2 Memory management

Flink uses a combination of on‑heap, off‑heap, managed memory, and network buffers. JobManager and TaskManager have separate memory models with configurable heap, off‑heap, and process memory.

8.3 Serialization

Flink provides its own type information system (BasicTypeInfo, TupleTypeInfo, PojoTypeInfo, etc.) and generates efficient TypeSerializer implementations.

8.4 Rich functions vs. plain functions

Rich functions expose lifecycle methods ( open, close) and a runtime context for accessing state, parallelism, and configuration.

8.5 Partitioning strategies

Global, Shuffle, Rebalance, Forward, Hash, Rescale, Broadcast, Custom

8.6 Distributed cache

Files can be registered with env.registerCachedFile("hdfs:///path", "alias") to avoid repeated downloads.

8.7 Asynchronous I/O

Define an AsyncFunction that issues non‑blocking requests, registers a callback to collect results, and use AsyncDataStream.unorderedWait or orderedWait with timeout and capacity settings.

8.8 Restart strategies

Fixed‑delay, failure‑rate, no‑restart, and fallback strategies are available to control job recovery behavior.

8.9 Data skew mitigation

Adjust source parallelism, add random prefixes (salting) before keyBy, and use two‑phase aggregation (local salting + global desalinization) to balance load.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink State Management Checkpoint Job Deployment

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.