Big Data 9 min read

Understanding Flink 1.11 JobManager and TaskManager Memory Configuration

This article details the major memory model changes in Flink 1.11 for JobManager and TaskManager, compares them with Flink 1.9, provides concrete JVM command examples, explains the relationship between memory settings and parallelism, and introduces fine‑grained resource management for streaming workloads.

Big Data Technology & Architecture

Aug 26, 2024

Understanding Flink 1.11 JobManager and TaskManager Memory Configuration

This article explains the memory configuration changes introduced in Flink 1.11 for both JobManager and TaskManager, compares them with Flink 1.9, and provides practical examples and command‑line parameters for interview preparation.

JobManager Memory Model

In Flink 1.11 the JobManager memory model is unified and consists of heap, direct memory, JVM metaspace, and JVM overhead. The following diagram shows the layout:

The main differences from Flink 1.9 are that Flink 1.9 only had heap and off‑heap modules, while Flink 1.11 splits the off‑heap part into Direct Memory, JVM Metaspace and JVM Overhead.

Key parameters are illustrated in the next figure:

Example: if the total JobManager memory is set to 1 GB, the generated JVM options are:

-Xmx469762048
-Xms469762048
-XX:MaxDirectMemorySize=134217728
-XX:MaxMetaspaceSize=268435456

Resulting allocation:

Heap: 448 MB

DirectMemory: 128 MB

Metaspace: 256 MB

Overhead: 192 MB (min(max(192 m, 0.1 * 1 GB), 1 GB))

Recommendation: only configure the total JobManager memory; all other values can remain at their defaults.

TaskManager Memory Model

Flink 1.11 introduces a more detailed TaskManager memory model, as shown below:

Core configuration parameters are illustrated in the following diagram:

Recommendation: only set the total TaskManager memory; other settings can stay default.

Example: for a TaskManager with 4196 MB total memory, the JVM options become:

# JVM configuration
-Xmx973204290  # 928.12MB
-Xms973204290
-XX:MaxDirectMemorySize=1241639855  # 1184.12MB
-XX:MaxMetaspaceSize=268435456   # 256MB

# Calculated configuration
taskmanager.memory.framework.off-heap.size=134217728b
taskmanager.memory.network.max=1107422127b
taskmanager.memory.network.min=1107422127b
taskmanager.memory.task.off-heap.size=0b
taskmanager.memory.framework.heap.size=134217728b
taskmanager.memory.task.heap.size=838986562b
taskmanager.memory.managed.size=1476562799b

If many TaskManagers are used, a common error is insufficient network buffers. Adjust the following configuration keys to resolve it: taskmanager.memory.network.min, taskmanager.memory.network.max, and taskmanager.memory.network.fraction.

Additional Note: Memory Settings and Parallelism

Estimating a job’s memory consumption must be coordinated with the chosen parallelism. For a total memory demand of 20 GB, you can either use 2 TaskManagers with 10 GB each or 10 TaskManagers with 2 GB each, but the slot allocation and Kafka partition distribution affect the optimal choice.

The official recommendation is to set the number of slots per TaskManager to a multiple of the machine’s CPU cores (e.g., 20, 40, 80 slots for a 2‑core, 4‑core machine), typically 10‑20 times the core count.

Since Flink 1.14, fine‑grained resource management allows users to specify exact resource profiles per slot (e.g., 0.25 Core + 1 GB memory), and Flink will allocate matching slots dynamically while leaving unused resources available.

Understanding these details helps answer interview questions thoroughly and demonstrates deep knowledge of Flink’s resource management.

big data Flink Streaming TaskManager JobManager Memory configuration