Understanding Apache Spark Unified (Dynamic) Memory Management
This article explains Spark's transition from static to unified memory management, detailing on‑heap and off‑heap memory regions, key configuration parameters, dynamic allocation behavior, and legacy mode, helping users optimize executor memory usage for both batch and streaming workloads.
Apache Spark is a leading in‑memory compute engine for big data, widely used in both batch and real‑time stream processing. When running jobs, it is essential to allocate appropriate resources to executors and understand how Spark manages memory to optimize performance.
In Spark versions up to 1.5, the default memory manager was StaticMemoryManager (static memory management). Starting with Spark 1.6.0, Spark switched to UnifiedMemoryManager , also called dynamic or unified memory management, which can adjust the sizes of Execution and Storage memory at runtime (see SPARK‑10000).
1. Spark Memory Management
Executor memory is divided into two main categories: Execution Memory for computation tasks such as shuffles, joins, sorts, and aggregations, and Storage Memory for cached data and internal data transfer.
By default, executors use on‑heap memory; off‑heap memory is available but disabled by default.
1.1 On‑heap Memory
On‑heap memory, configured via spark.executor.memory or --executor-memory , is split into four regions:
Execution Memory : used for shuffles, joins, sorts, aggregations (also called Shuffle Memory).
Storage Memory : used for cached data and unrolled data (Cache Memory).
User Memory : reserved for user‑defined data structures and metadata.
Reserved Memory : a default 300 MB system reserve for the JVM and Spark runtime (see SPARK‑12081).
The fraction of usable memory allocated to the combined Execution and Storage memory is controlled by spark.memory.fraction (default 0.75 in Spark 1.6, 0.6 in Spark 2.x). Within that, spark.memory.storageFraction (default 0.5) determines the split between Storage and Execution memory.
1.2 Off‑heap Memory
Since Spark 1.6, off‑heap memory can be enabled to reduce GC overhead by setting spark.memory.offHeap.enabled=true and specifying its size with spark.memory.offHeap.size . Off‑heap memory only contains Execution and Storage regions; there is no separate User or Reserved memory.
3. Dynamic Memory Allocation
Unified memory management allows Execution and Storage memory to borrow from each other dynamically: if Execution memory is idle, Storage can use it, and vice versa. This flexibility was not available in the static memory manager used before Spark 1.6.
4. Legacy Mode
Although Spark 1.6 and later default to the unified manager, the older static manager (StaticMemoryManager) is still available via the spark.memory.useLegacyMode flag, which is false by default.
5. Conclusion
From Spark 1.6.0 onward, the default memory management model is dynamic (unified) and continues through Spark 2.x. Understanding the division of on‑heap and off‑heap memory, as well as the key configuration parameters, is crucial for optimizing Spark jobs.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.