Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Memory
This article explains Spark's executor memory architecture, covering on‑heap and off‑heap memory planning, static and unified memory managers, storage and execution memory allocation, RDD persistence, eviction policies, and shuffle memory usage, providing practical guidance for performance tuning.
Overview
Spark is an in‑memory distributed computing engine whose memory management module is crucial for application development and performance tuning. The article outlines Spark memory management concepts based on Spark 2.1 and assumes readers are familiar with Spark, Java, RDD, Shuffle, and the JVM.
When a Spark application runs, the cluster launches a Driver JVM (the master) and multiple Executor JVMs (workers). The Driver creates the Spark context and schedules tasks, while Executors perform the actual computation and store persisted RDDs. This article focuses on Executor memory.
On‑Heap and Off‑Heap Memory Planning
Executor memory builds on JVM memory management. Spark allocates on‑heap (On‑heap) space in detail and also introduces off‑heap (Off‑heap) memory that resides directly in system memory, improving utilization.
2.1 On‑Heap Memory
The size is configured via --executor-memory or spark.executor.memory. Concurrent tasks share the JVM heap; memory used for cached RDDs and broadcast data is classified as Storage memory, while memory used during Shuffle is Execution memory. Remaining heap space is used for other objects.
Spark’s on‑heap management is logical: object allocation and deallocation are handled by the JVM, while Spark records memory usage before allocation and after release.
Memory allocation:
Spark creates a new object in code.
The JVM allocates heap space and returns a reference.
Spark stores the reference and records the memory usage.
Memory release:
Spark removes the reference and records the release.
The JVM garbage collector eventually frees the heap space.
Serialized objects occupy a calculable amount of memory, while non‑serialized objects are estimated by periodic sampling, which can introduce errors. Objects marked for release may not be immediately reclaimed, leading to discrepancies between Spark’s view and actual available heap memory, and potentially causing OOM errors.
Although precise control is impossible, Spark’s separate planning for Storage and Execution memory allows it to decide whether to cache new RDDs or allocate execution memory, improving overall utilization.
2.2 Off‑Heap Memory
Off‑heap memory stores serialized binary data directly in system memory, reducing GC overhead and improving shuffle sorting efficiency. It is enabled via spark.memory.offHeap.enabled and sized with spark.memory.offHeap.size. Off‑heap memory shares the same partitioning concept as on‑heap memory (Storage and Execution).
2.3 Memory Management Interface
Spark provides a unified MemoryManager interface for requesting and releasing memory:
// acquire storage memory
def acquireStorageMemory(blockId: BlockId, numBytes: Long, memoryMode: MemoryMode): Boolean
// acquire unroll memory
def acquireUnrollMemory(blockId: BlockId, numBytes: Long, memoryMode: MemoryMode): Boolean
// acquire execution memory
def acquireExecutionMemory(numBytes: Long, taskAttemptId: Long, memoryMode: MemoryMode): Long
// release storage memory
def releaseStorageMemory(numBytes: Long, memoryMode: MemoryMode): Unit
// release execution memory
def releaseExecutionMemory(numBytes: Long, taskAttemptId: Long, memoryMode: MemoryMode): Unit
// release unroll memory
def releaseUnrollMemory(numBytes: Long, memoryMode: MemoryMode): UnitThe MemoryMode argument determines whether the operation targets on‑heap or off‑heap memory.
Memory Space Allocation
3.1 Static Memory Management
In the original static manager, Storage, Execution, and other memory sizes are fixed for the lifetime of the application. Users configure the sizes before launch.
Available heap memory is calculated as:
availableStorageMemory = systemMaxMemory * spark.storage.memoryFraction * spark.storage.safetyFraction
availableExecutionMemory = systemMaxMemory * spark.shuffle.memoryFraction * spark.shuffle.safetyFractionOff‑heap allocation is simpler: only Storage and Execution memory share the space, governed by spark.memory.storageFraction, without a safety fraction.
3.2 Unified Memory Management
Since Spark 1.6, the Unified Memory Manager allows Storage and Execution memory to share a common pool, dynamically borrowing idle space from each other.
Key rules of the dynamic borrowing mechanism:
Base Storage and Execution fractions are set via spark.storage.storageFraction.
If one side runs out of space, it can borrow from the other; if both run out, data is spilled to disk.
Execution memory borrowed by Storage can be evicted back to disk and returned.
Storage cannot force Execution to return borrowed space because of shuffle complexities.
Storage Memory Management
4.1 RDD Persistence
RDDs are immutable partitioned collections. Persisting an RDD (via persist or cache) stores its partitions in memory or disk, reducing recomputation for subsequent actions. The Storage module (Driver’s BlockManager as master, Executor’s BlockManager as slave) manages Blocks, each representing a cached partition.
Spark defines seven storage levels (e.g., MEMORY_ONLY, MEMORY_AND_DISK) composed of five flags: useDisk, useMemory, useOffHeap, deserialized, and replication.
class StorageLevel private(
private var _useDisk: Boolean, // disk
private var _useMemory: Boolean, // on‑heap
private var _useOffHeap: Boolean, // off‑heap
private var _deserialized: Boolean, // non‑serialized
private var _replication: Int = 1 // replicas
)Storage level determines location (disk/on‑heap/off‑heap), format (serialized vs. deserialized), and replication factor.
4.2 RDD Caching Process
Before caching, a partition is accessed via an Iterator. Caching converts the partition into a Block. Depending on the storage level, the block is stored either serialized ( SerializedMemoryEntry) or deserialized ( DeserializedMemoryEntry). Executors keep blocks in a LinkedHashMap that records memory allocation and release.
During unroll, Spark requests temporary unroll space from the MemoryManager. Serialized partitions can compute required space directly; non‑serialized partitions request space incrementally per record. If unroll succeeds, the temporary space becomes permanent cache space.
4.3 Eviction and Disk Spill
When storage memory is full, Spark evicts old blocks based on LRU order, respecting the same MemoryMode, different RDDs, and ensuring the evicted RDD is not currently being read. If the block’s storage level includes disk, it is spilled; otherwise it is simply dropped.
Execution Memory Management
5.1 Memory Allocation Among Tasks
All tasks in an executor share execution memory. Each task can acquire between 1/(2N) and 1/N of the executor’s memory, where N is the number of concurrent tasks. If insufficient memory is available, the task blocks until memory is freed.
5.2 Shuffle Memory Usage
Shuffle write uses either ExternalSorter (heap) or ShuffleExternalSorter (heap or off‑heap, depending on Tungsten). Shuffle read stores intermediate aggregation results in heap execution memory. When the in‑memory hash map (AppendOnlyMap) grows too large, Spark spills to disk and later merges the spilled files.
Tungsten abstracts memory pages via MemoryBlock, using a 64‑bit logical address (13‑bit page number + 51‑bit offset). This unified addressing allows sorting without deserialization, greatly improving CPU and memory efficiency.
In summary, Spark’s storage memory is managed with a LinkedHashMap of blocks, while execution memory relies on AppendOnlyMap and page‑based management for shuffle operations, offering distinct yet complementary strategies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
