Big Data 17 min read

Understanding Spark’s Memory Model: Unified Memory Management, On‑Heap and Off‑Heap Memory, and Configuration

This article explains Spark’s unified memory management model, detailing the division between on‑heap and off‑heap memory, the roles of execution, storage, user, and reserved memory, configuration parameters, dynamic allocation, and how these concepts affect performance and resource utilization.

Big Data Technology & Architecture

Dec 6, 2021

Understanding Spark’s Memory Model: Unified Memory Management, On‑Heap and Off‑Heap Memory, and Configuration

Preface

In previous lessons we covered Spark RDDs and key components of the Spark system, as well as the important concept of Shuffle. This lesson focuses on Spark’s memory model , which determines the resources required for Spark code execution.

Memory Partitioning

Spark 2.0 adopts a Unified Memory Management Mode that includes two main regions: on‑heap memory ( On‑heap Memory) and off‑heap memory ( Off‑heap Memory).

Off‑heap memory is controlled directly by Spark in the system memory of each worker node. It is divided into Execution memory and Storage memory, which cannot be accessed directly by user code. Off‑heap memory is also used by the parameters spark.memory.offHeap.enabled and spark.memory.offHeap.size.

On‑heap memory relies on the JVM and contains:

Execution Memory : temporary data for Shuffle, Join, Sort, Aggregation, etc.

Storage Memory : cached RDD data and unrolled data.

User Memory : stores RDD dependency information.

Reserved Memory : system‑reserved space for internal Spark objects.

Configuration

On‑Heap Memory Settings

Reserved Memory : default 300 MB; usually unchanged, but executors with less than 1.5 × 300 = 450 MB cannot run.

Storage Memory : holds broadcast data and cached RDDs.

In Spark 2.0+ the default allocation gives roughly 30 % of total memory to each of Storage and Execution memory.

Under the unified memory mode, these two parts can borrow space from each other.

Off‑Heap Memory Settings

Off‑heap memory was introduced in Spark 1.6. It bypasses the JVM by using unsafe APIs (e.g., malloc()) to allocate memory directly from the OS, reducing GC overhead but requiring manual allocation and release logic.

Off‑heap memory can be of two logical types: DirectMemory and JVM Overhead. It is disabled by default and can be enabled with spark.memory.offHeap.enabled and sized with spark.memory.offHeap.size. When enabled, both on‑heap and off‑heap execution and storage memory are considered together.

Dynamic Adjustment

Since Spark 1.6, the unified memory manager allows Storage and Execution memory to share the same pool and dynamically borrow idle space from each other. The key rules are:

Set a base Storage fraction via spark.storage.storageFraction to define each side’s range.

If both sides lack space, data spills to disk; if one side has free space, it can borrow from the other.

When Execution memory is borrowed, the Storage side can spill the borrowed blocks to disk and reclaim space.

Memory Management Details

Storage memory uses a LinkedHashMap to manage Blocks, each representing a cached RDD partition. Execution memory uses an AppendOnlyMap for Shuffle data and, with Tungsten, employs a page‑based memory model where each page is a MemoryBlock identified by an object reference and offset (on‑heap) or a raw address (off‑heap).

Shuffle write and read stages allocate memory as follows:

Shuffle Write : uses ExternalSorter (heap) or ShuffleExternalSorter (off‑heap) depending on the sorting method.

Shuffle Read : aggregates data with Aggregator, using heap execution memory.

When memory pressure occurs, Spark spills data to disk (spill) and later merges the spilled files.

Storage vs Execution Management

Storage memory is managed by a master‑slave BlockManager architecture, while Execution memory relies on AppendOnlyMap and Tungsten’s page abstraction, allowing unified addressing of both on‑heap and off‑heap pages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Memory Management Spark Off-Heap Unified Memory Execution Memory Storage Memory

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.