Big Data 7 min read

Understanding Apache Spark Unified (Dynamic) Memory Management

This article explains Spark's transition from static to unified memory management, detailing on‑heap and off‑heap memory regions, key configuration parameters, dynamic allocation behavior, and legacy mode, helping users optimize executor memory usage for both batch and streaming workloads.

Full-Stack Internet Architecture

Jul 17, 2020

Understanding Apache Spark Unified (Dynamic) Memory Management

Apache Spark is a leading in‑memory compute engine for big data, widely used in both batch and real‑time stream processing. When running jobs, it is essential to allocate appropriate resources to executors and understand how Spark manages memory to optimize performance.

In Spark versions up to 1.5, the default memory manager was StaticMemoryManager (static memory management). Starting with Spark 1.6.0, Spark switched to UnifiedMemoryManager , also called dynamic or unified memory management, which can adjust the sizes of Execution and Storage memory at runtime (see SPARK‑10000).

1. Spark Memory Management

Executor memory is divided into two main categories: Execution Memory for computation tasks such as shuffles, joins, sorts, and aggregations, and Storage Memory for cached data and internal data transfer.

By default, executors use on‑heap memory; off‑heap memory is available but disabled by default.

1.1 On‑heap Memory

On‑heap memory, configured via spark.executor.memory or --executor-memory, is split into four regions:

Execution Memory : used for shuffles, joins, sorts, aggregations (also called Shuffle Memory).

Storage Memory : used for cached data and unrolled data (Cache Memory).

User Memory : reserved for user‑defined data structures and metadata.

Reserved Memory : a default 300 MB system reserve for the JVM and Spark runtime (see SPARK‑12081).

The fraction of usable memory allocated to the combined Execution and Storage memory is controlled by spark.memory.fraction (default 0.75 in Spark 1.6, 0.6 in Spark 2.x). Within that, spark.memory.storageFraction (default 0.5) determines the split between Storage and Execution memory.

1.2 Off‑heap Memory

Since Spark 1.6, off‑heap memory can be enabled to reduce GC overhead by setting spark.memory.offHeap.enabled=true and specifying its size with spark.memory.offHeap.size. Off‑heap memory only contains Execution and Storage regions; there is no separate User or Reserved memory.

3. Dynamic Memory Allocation

Unified memory management allows Execution and Storage memory to borrow from each other dynamically: if Execution memory is idle, Storage can use it, and vice versa. This flexibility was not available in the static memory manager used before Spark 1.6.

4. Legacy Mode

Although Spark 1.6 and later default to the unified manager, the older static manager (StaticMemoryManager) is still available via the spark.memory.useLegacyMode flag, which is false by default.

5. Conclusion

From Spark 1.6.0 onward, the default memory management model is dynamic (unified) and continues through Spark 2.x. Understanding the division of on‑heap and off‑heap memory, as well as the key configuration parameters, is crucial for optimizing Spark jobs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Memory Management Executor Spark Unified Memory

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.