Understanding Flink 1.10 TaskManager Memory Model and Configuration Parameters
This article explains the new unified TaskManager memory model introduced in Flink 1.10, detailing each memory component, its configuration parameters, how they map to JVM settings, and practical guidance for both standalone and containerized deployments, including a concrete YARN example.
Preface
The Flink community introduced a new unified TaskManager memory model and configuration in FLIP‑49, which is one of the main improvements in Flink 1.10. The proposal aims to solve three drawbacks of the pre‑1.9 memory configuration: different methods for streaming and batch jobs, overly complex RocksDB state‑backend memory settings, and hidden, hard‑to‑understand details such as container memory cut‑offs.
Overview of the New Memory Model and Parameters
The official diagram is shown below.
Below we explain each region.
Total Flink Memory
Meaning All memory used by the TaskManager process that is related to Flink (excluding JVM metaspace and other overhead). It consists of four parts: framework memory (heap & off‑heap), managed memory (off‑heap), network cache (off‑heap), and task memory (heap & off‑heap).
Parameter taskmanager.memory.flink.size: no default, must be set by the user.
Framework Memory
Meaning Memory used by the Flink runtime itself. Generally fixed and does not need adjustment except in special cases such as very high parallelism or intensive interaction with external systems.
Parameters taskmanager.memory.framework.heap.size: default 128 MB. taskmanager.memory.framework.off-heap.size: default 128 MB (direct memory).
Managed Memory
Meaning Pure off‑heap memory managed by the MemoryManager for intermediate result buffering, sorting, hash tables, and the RocksDB state backend. Users can now explicitly control RocksDB memory.
Parameters taskmanager.memory.managed.fraction: fraction of total Flink memory allocated to managed memory, default 0.4. taskmanager.memory.managed.size: actual size, no default; derived from the fraction.
Network Cache
Meaning Pure off‑heap memory for data transfer between TaskManagers (shuffle, broadcast, etc.) and external components, allocated as direct memory.
Parameters taskmanager.memory.network.min: minimum network cache, default 64 MB. taskmanager.memory.network.max: maximum network cache, default 1 GB. taskmanager.memory.network.fraction: fraction of total Flink memory, default 0.1, bounded by the min and max values.
Task Memory
Meaning Memory actually occupied by operator logic, user code, and custom data structures.
Parameters taskmanager.memory.task.heap.size: no default; usually derived automatically. taskmanager.memory.task.off-heap.size: default 0 (not used).
Total Process Memory
Meaning In containerized deployments (YARN/K8s/Mesos), this is the sum of Flink total memory, JVM metaspace, and JVM overhead – the memory size of the container itself.
Parameter taskmanager.memory.process.size: no default, must be set by the user.
JVM Metaspace
Meaning Metaspace for class metadata.
Parameter taskmanager.memory.jvm-metaspace.size: default 256 MB.
JVM Overhead
Meaning Additional native memory reserved for thread stacks, code cache, etc., similar to the old container heap‑cutoff memory. The previous containerized.heap-cutoff-ratio and containerized.heap-cutoff-min parameters no longer apply.
Parameters taskmanager.memory.jvm-overhead.min: default 192 MB. taskmanager.memory.jvm-overhead.max: default 1 GB. taskmanager.memory.jvm-overhead.fraction: fraction of total process memory, default 0.1, bounded by the min and max values.
Relationship Between Flink Memory Parameters and JVM Options
-Xmx/ -Xms: correspond to the sum of heap framework memory and heap task memory. -XX:MaxDirectMemorySize: corresponds to the sum of off‑heap framework memory, off‑heap task memory, and network cache. -XX:MaxMetaspaceSize: corresponds to the JVM metaspace setting.
How Should You Configure?
Although many parameters exist, most users only need to set a few:
For a Standalone deployment, set taskmanager.memory.flink.size.
For a containerized deployment, set taskmanager.memory.process.size.
Let Flink handle the remaining allocations automatically. If fine‑tuning is required, first adjust the network cache fraction ( taskmanager.memory.network.fraction) based on traffic, and the managed memory fraction ( taskmanager.memory.managed.fraction) based on RocksDB state size. Over‑specifying fixed sizes can cause conflicts and deployment failures.
Example
Assume a Flink‑on‑YARN environment with the following settings:
taskmanager.memory.process.size = 4096 MB<br/>taskmanager.memory.network.fraction = 0.15<br/>taskmanager.memory.managed.fraction = 0.45The derived memory metrics are:
taskmanager.memory.jvm-overhead = 4096 * 0.1 = 409.6 MB<br/>taskmanager.memory.flink.size = 4096 - 409.6 - 256 = 3430.4 MB<br/>taskmanager.memory.network = 3430.4 * 0.15 = 514.56 MB<br/>taskmanager.memory.managed = 3430.4 * 0.45 = 1543.68 MB<br/>taskmanager.memory.task.heap.size = 3430.4 - 128*2 - 1543.68 - 514.56 = 1116.16 MBFinally, the article includes a friendly reminder to like, share, and collect the post.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
