Big Data 7 min read

Understanding Spark Unified Memory Management and Dynamic Allocation

This article explains Apache Spark's memory architecture, covering the shift from static to unified memory management, the roles of on‑heap and off‑heap memory, configurable parameters, dynamic memory sharing between execution and storage, and the legacy mode introduced in Spark 1.6.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Understanding Spark Unified Memory Management and Dynamic Allocation

Apache Spark is a leading in‑memory engine for big data processing, used in both batch and streaming workloads. When running jobs, users allocate resources such as executor memory, and understanding how that memory is managed is crucial for performance optimization.

Until Spark 1.5 the default memory manager was StaticMemoryManager . Starting with Spark 1.6, Spark adopts the UnifiedMemoryManager , also called dynamic memory management, which can adjust the sizes of Execution and Storage memory at runtime (see SPARK‑10000).

1. Spark Memory Management

Spark memory is divided into two main categories: Execution Memory for computation tasks like shuffles, joins, sorts, and aggregations, and Storage Memory for caching data and internal data transfer.

Executors use on‑heap memory by default; off‑heap memory is optional and disabled by default.

1.1 On‑heap Memory

The executor memory configured via spark.executor.memory or --executor-memory is on‑heap and consists of four regions:

Execution Memory : used for shuffles, joins, sorts, aggregations (also called Shuffle Memory).

Storage Memory : used for cached data and unrolled data (Cache Memory).

User Memory : stores internal metadata and user‑defined data structures.

Reserved Memory : a default 300 MB reserved for the JVM and system overhead (see SPARK‑12081).

Key configuration parameters:

spark.memory.fraction – default 0.75 in Spark 1.6 (75 % of usable memory) and 0.6 in Spark 2.x (60 % of usable memory).

spark.memory.storageFraction – default 0.5, meaning Execution and Storage each get half of the Spark Memory.

1.2 Off‑heap Memory

Off‑heap memory, introduced in Spark 1.6 to reduce GC overhead, is disabled by default. It can be enabled with spark.memory.offHeap.enabled=true and sized via spark.memory.offHeap.size . Off‑heap memory only contains Execution and Storage regions; there is no User or Reserved memory.

3. Dynamic Memory Allocation

In the unified model, Execution and Storage memory share a flexible boundary: when one region has free space, the other can borrow it. This dynamic sharing replaces the fixed allocation of the static model used in Spark 1.5 and earlier.

4. Legacy Mode

Although Spark 1.6 switched to the unified manager by default, the old static manager (StaticMemoryManager) is still available via the spark.memory.useLegacyMode flag, which is false by default.

5. Conclusion

Since Spark 1.6, the default memory management is dynamic (unified) and continues through Spark 2.x. This article summarized Spark’s overall memory usage, on‑heap and off‑heap memory structures, and the key parameters that control dynamic memory allocation.

Memory ManagementExecutorApache SparkOff-heapOn-HeapUnifiedMemoryManager
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.