Mastering Spark’s Unified Memory Management: A Deep Dive into On‑Heap & Off‑Heap Tuning
This article explains Spark's unified memory manager, detailing on‑heap and off‑heap memory regions, dynamic memory sharing, task memory allocation, and practical tuning techniques to optimize performance and avoid common out‑of‑memory errors.
1. Spark Memory Model
1.1 Overview
Understanding Spark's memory management is essential for efficient resource allocation and tuning; it helps identify problematic memory regions without simply increasing memory size.
Versions prior to 1.6 used static memory management, while Spark 1.6 and later adopt a Unified Memory Manager . This article focuses on the unified approach.
The Spark UI "Executors" tab shows memory allocation for a task submitted in standalone client mode with the following configuration:
Command‑line options used:
--executor-memory 2g --driver-memory 1g --total-executor-cores 4The unified memory manager comprises two main regions: On‑heap Memory and Off‑heap Memory .
1.2 On‑heap Memory
By default Spark uses only on‑heap memory, which is divided into four parts:
Execution Memory : stores temporary data for shuffle, join, sort, aggregation, etc.
Storage Memory : holds cached RDD data and unrolled data.
User Memory : keeps metadata such as RDD dependencies.
Reserved Memory : system‑reserved space for internal Spark objects.
Key memory parameters: systemMemory = Runtime.getRuntime.maxMemory (configured via spark.executor.memory or --executor-memory) reservedMemory = 300MB in Spark 2.4.3 (modifiable in testing with spark.testing.reservedMemory)
usableMemory = systemMemory - reservedMemory unifiedMemory = usableMemory * 0.6(default 60% share)
Minimum task memory = reservedMemory * 1.5 = 450MB; tasks requesting less will fail.
1.3 Off‑heap Memory
Since Spark 1.6, off‑heap memory can be enabled via spark.memory.offHeap.enabled and sized with spark.memory.offHeap.size. Off‑heap memory is allocated outside the JVM using unsafe APIs, avoiding GC overhead but requiring manual allocation and release logic.
When enabled, both on‑heap and off‑heap regions coexist, and Execution and Storage memory are the sum of their respective on‑heap and off‑heap parts.
1.4 Dynamic Memory Adjustment
Before Spark 1.6, Execution and Storage memory were statically partitioned; insufficient Execution memory could not borrow from free Storage memory. With the unified manager, the two regions can share space dynamically.
Implementation details:
Initial Allocation: set via spark.memory.storageFraction.
If both sides lack space, data is spilled to disk using an LRU policy.
When one side borrows space, the other may evict its blocks to disk and return the borrowed memory.
Storage side cannot currently return borrowed space due to shuffle complexities.
Borrowing only occurs between like‑type memories (both on‑heap or both off‑heap).
1.5 Task Memory Allocation
Tasks share Execution memory. Spark maintains a HashMap tracking each Task's memory usage. When a Task requests numBytes, Spark checks available Execution memory and updates the map accordingly.
Each Task must acquire at least 1/2N of the total Execution memory (where N is the number of concurrent Tasks). The usable range per Task is 1/2N – 1/N. For example, with 10 GB Execution memory and 5 Tasks, each Task can request between 1 GB and 2 GB.
2. Spark Memory Tuning
2.1 Determine Memory Consumption
Create an RDD, cache it, and inspect the "Storage" tab in the Web UI to see its memory usage. Use SizeEstimator.estimate to estimate the size of specific objects, such as broadcast variables.
2.2 Optimize Data Structures
Reduce memory overhead by avoiding pointer‑heavy Java/Scala collections and using primitive arrays or specialized libraries like fastutil . Prefer flat structures, numeric or enum keys instead of strings, and enable -XX:+UseCompressedOops for JVMs with < 32 GB RAM.
2.3 Serialize RDD Storage
When RDDs are large, persist them with serialization (e.g., StorageLevels.MEMORY_ONLY_SER) using Kryo for better efficiency. If OOM persists, consider StorageLevels.MEMORY_AND_DISK based on data size.
2.4 Adjust Parallelism
Set parallelism to roughly 2–3 times the total CPU cores. Tune spark.default.parallelism (effective during shuffle), use rdd.repartition to increase partitions, and configure spark.sql.shuffle.partitions (default 200) for SparkSQL.
2.5 Broadcast Variables
Convert large read‑only objects on the driver into broadcast variables so that Executors fetch them from the nearest BlockManager, reducing network traffic.
2.6 Use Map‑Side Pre‑Aggregation
Perform local aggregation on each node (e.g., using reduceByKey or aggregateByKey) to minimize data transferred during shuffle, instead of groupByKey.
2.7 GC Optimization
GC tuning involves many aspects and can be covered in a dedicated article.
3. Common Issues
Typical executor‑related errors include:
java.lang.OutOfMemoryError
ExecutorLostFailure
Executor exit code: 143
Heartbeat timeout
Shuffle file lost
Refer to the official Spark tuning guide and detailed memory management articles for further guidance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
