Mastering Spark’s Unified Memory Management: A Deep Dive into On‑Heap & Off‑Heap Tuning
This article explains Spark's unified memory manager, detailing on‑heap and off‑heap memory regions, dynamic memory sharing, task memory allocation, and practical tuning techniques to optimize performance and avoid common out‑of‑memory errors.
1. Spark Memory Model
1.1 Overview
Understanding Spark's memory management is essential for efficient resource allocation and tuning; it helps identify problematic memory regions without simply increasing memory size.
Versions prior to 1.6 used static memory management, while Spark 1.6 and later adopt a Unified Memory Manager . This article focuses on the unified approach.
The Spark UI "Executors" tab shows memory allocation for a task submitted in standalone client mode with the following configuration:
Command‑line options used:
--executor-memory 2g --driver-memory 1g --total-executor-cores 4The unified memory manager comprises two main regions: On‑heap Memory and Off‑heap Memory .
1.2 On‑heap Memory
By default Spark uses only on‑heap memory, which is divided into four parts:
Execution Memory : stores temporary data for shuffle, join, sort, aggregation, etc.
Storage Memory : holds cached RDD data and unrolled data.
User Memory : keeps metadata such as RDD dependencies.
Reserved Memory : system‑reserved space for internal Spark objects.
Key memory parameters:
systemMemory = Runtime.getRuntime.maxMemory(configured via
spark.executor.memoryor
--executor-memory)
reservedMemory = 300MBin Spark 2.4.3 (modifiable in testing with
spark.testing.reservedMemory)
usableMemory = systemMemory - reservedMemory unifiedMemory = usableMemory * 0.6(default 60% share)
Minimum task memory =
reservedMemory * 1.5 = 450MB; tasks requesting less will fail.
1.3 Off‑heap Memory
Since Spark 1.6, off‑heap memory can be enabled via
spark.memory.offHeap.enabledand sized with
spark.memory.offHeap.size. Off‑heap memory is allocated outside the JVM using unsafe APIs, avoiding GC overhead but requiring manual allocation and release logic.
When enabled, both on‑heap and off‑heap regions coexist, and Execution and Storage memory are the sum of their respective on‑heap and off‑heap parts.
1.4 Dynamic Memory Adjustment
Before Spark 1.6, Execution and Storage memory were statically partitioned; insufficient Execution memory could not borrow from free Storage memory. With the unified manager, the two regions can share space dynamically.
Implementation details:
Initial Allocation: set via
spark.memory.storageFraction.
If both sides lack space, data is spilled to disk using an LRU policy.
When one side borrows space, the other may evict its blocks to disk and return the borrowed memory.
Storage side cannot currently return borrowed space due to shuffle complexities.
Borrowing only occurs between like‑type memories (both on‑heap or both off‑heap).
1.5 Task Memory Allocation
Tasks share Execution memory. Spark maintains a HashMap tracking each Task's memory usage. When a Task requests
numBytes, Spark checks available Execution memory and updates the map accordingly.
Each Task must acquire at least
1/2Nof the total Execution memory (where
Nis the number of concurrent Tasks). The usable range per Task is
1/2N–
1/N. For example, with 10 GB Execution memory and 5 Tasks, each Task can request between 1 GB and 2 GB.
2. Spark Memory Tuning
2.1 Determine Memory Consumption
Create an RDD, cache it, and inspect the "Storage" tab in the Web UI to see its memory usage. Use
SizeEstimator.estimateto estimate the size of specific objects, such as broadcast variables.
2.2 Optimize Data Structures
Reduce memory overhead by avoiding pointer‑heavy Java/Scala collections and using primitive arrays or specialized libraries like fastutil . Prefer flat structures, numeric or enum keys instead of strings, and enable
-XX:+UseCompressedOopsfor JVMs with < 32 GB RAM.
2.3 Serialize RDD Storage
When RDDs are large, persist them with serialization (e.g.,
StorageLevels.MEMORY_ONLY_SER) using Kryo for better efficiency. If OOM persists, consider
StorageLevels.MEMORY_AND_DISKbased on data size.
2.4 Adjust Parallelism
Set parallelism to roughly 2–3 times the total CPU cores. Tune
spark.default.parallelism(effective during shuffle), use
rdd.repartitionto increase partitions, and configure
spark.sql.shuffle.partitions(default 200) for SparkSQL.
2.5 Broadcast Variables
Convert large read‑only objects on the driver into broadcast variables so that Executors fetch them from the nearest BlockManager, reducing network traffic.
2.6 Use Map‑Side Pre‑Aggregation
Perform local aggregation on each node (e.g., using
reduceByKeyor
aggregateByKey) to minimize data transferred during shuffle, instead of
groupByKey.
2.7 GC Optimization
GC tuning involves many aspects and can be covered in a dedicated article.
3. Common Issues
Typical executor‑related errors include:
java.lang.OutOfMemoryError
ExecutorLostFailure
Executor exit code: 143
Heartbeat timeout
Shuffle file lost
Refer to the official Spark tuning guide and detailed memory management articles for further guidance.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.