Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Strategies
This article explains Spark's executor memory architecture, covering on‑heap and off‑heap allocation, static versus unified memory managers, storage and execution memory handling, RDD persistence levels, eviction policies, and shuffle memory usage, providing practical formulas and configuration tips for optimal performance.
1. On‑Heap and Off‑Heap Memory Planning
Executor processes run as JVMs. Spark allocates on‑heap memory for tasks and can also use off‑heap memory that resides directly in system RAM, reducing GC overhead and improving shuffle efficiency.
On‑Heap Memory
Configured via --executor-memory or spark.executor.memory. The heap is divided into storage memory (caches RDDs and broadcast data) and execution memory (used during shuffle). Remaining space holds JVM objects and user‑defined objects.
Memory allocation flow:
Spark creates a new object instance.
JVM allocates heap space and returns a reference.
Spark records the reference and the memory used.
Memory release flow:
Spark removes the reference and marks the memory for release.
JVM garbage collector eventually frees the heap space.
Serialized objects have a known byte size; non‑serialized objects are sampled periodically, which can cause estimation errors and occasional OOM.
Off‑Heap Memory
Enabled with spark.memory.offHeap.enabled and sized with spark.memory.offHeap.size. Off‑heap stores serialized binary data using the JDK Unsafe API, allowing precise allocation and release without safety‑fraction buffers.
2. Memory Management Interface
Spark provides a unified MemoryManager interface used by all tasks within an executor to request or release memory. The memoryMode argument selects on‑heap or off‑heap.
def acquireStorageMemory(blockId: BlockId, numBytes: Long, memoryMode: MemoryMode): Boolean
def acquireUnrollMemory(blockId: BlockId, numBytes: Long, memoryMode: MemoryMode): Boolean
def acquireExecutionMemory(numBytes: Long, taskAttemptId: Long, memoryMode: MemoryMode): Long
def releaseStorageMemory(numBytes: Long, memoryMode: MemoryMode): Unit
def releaseExecutionMemory(numBytes: Long, taskAttemptId: Long, memoryMode: MemoryMode): Unit
def releaseUnrollMemory(numBytes: Long, memoryMode: MemoryMode): Unit3. Memory Space Allocation
Static Memory Management (pre‑Spark 1.6)
Storage, execution and other memory sizes are fixed for the lifetime of the application. Users configure fractions via parameters:
availableStorageMemory = systemMaxMemory * spark.storage.memoryFraction * spark.storage.safetyFraction
availableExecutionMemory = systemMaxMemory * spark.shuffle.memoryFraction * spark.shuffle.safetyFraction systemMaxMemoryis derived from the JVM heap size. The safety fraction reserves a buffer to mitigate OOM risk.
Unified Memory Management (Spark 1.6+)
Storage and execution memory share a common pool, allowing dynamic borrowing of idle space. This reduces the “half‑sea, half‑fire” problem where one region is exhausted while the other remains underutilized.
Dynamic borrowing follows these rules:
Basic storage and execution fractions are set via spark.storage.storageFraction.
If both sides lack space, data spills to disk.
If one side has free space, it can borrow from the other.
Borrowed execution memory can be evicted back to disk, returning the space.
Borrowed storage memory cannot be returned because of shuffle constraints.
4. Storage Memory Management
Persisting an RDD uses the Storage module, which manages Blocks (one per Partition) on both driver and executor.
StorageLevel combines five flags (disk, memory, off‑heap, deserialized, replication). Example definition:
class StorageLevel private(
private var _useDisk: Boolean,
private var _useMemory: Boolean,
private var _useOffHeap: Boolean,
private var _deserialized: Boolean,
private var _replication: Int = 1
)Key dimensions:
Location: Disk / On‑heap / Off‑heap.
Form: Serialized vs. deserialized.
Replication count.
During caching, a Partition is transformed into a Block. Serialized Blocks use SerializedMemoryEntry (ByteBuffer); deserialized Blocks use DeserializedMemoryEntry (object array). Executors keep Blocks in a LinkedHashMap, tracking memory usage.
Unroll phase allocates temporary space; if insufficient, the cache fails. Serialized partitions compute required space directly, while deserialized partitions estimate per‑record during iteration.
Eviction and Disk Spill
When storage memory is full, old Blocks are evicted based on LRU order, respecting the same MemoryMode and ensuring they belong to different RDDs. If a Block’s storage level includes disk, it is written to disk; otherwise it is discarded.
5. Execution Memory Management
Tasks share execution memory; each task may use between 1/(2N) and 1/N of the pool, where N is the number of concurrent tasks. A task requests at least 1/(2N) memory; if unavailable, it blocks until memory is freed.
Shuffle Memory Usage
Shuffle write uses either ExternalSorter (heap execution memory) or ShuffleExternalSorter (heap or off‑heap depending on configuration). Shuffle read aggregates data with an Aggregator, also using heap execution memory. When the in‑memory hash map (AppendOnlyMap) grows too large, Spark samples its size and spills to disk, later merging the files.
Tungsten’s page‑based memory management abstracts both on‑heap and off‑heap pages via MemoryBlock objects, each identified by a 64‑bit logical address (13‑bit page number + 51‑bit offset). This enables efficient pointer‑based sorting without deserialization.
6. Summary
Spark’s memory management combines on‑heap, off‑heap, static and unified strategies to balance storage and execution needs. Understanding allocation formulas, configuration parameters and eviction policies is essential for avoiding OOM errors and achieving optimal performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
