Oct 27, 2022 · Big Data

Boost Spark Performance: Proven Code Optimizations & Tuning Tips

This article outlines practical Spark job optimization techniques—from code-level improvements and resource tuning to data skew handling, persistence strategies, shuffle reduction, broadcast variables, Kryo serialization, and efficient data structures—demonstrating how each can dramatically cut execution time.

Big DataKryo SerializationPerformance Tuning

0 likes · 19 min read

Boost Spark Performance: Proven Code Optimizations & Tuning Tips

Big Data Technology Architecture

Apr 28, 2019 · Big Data

Apache Spark Memory Management: Storage and Execution Memory (Part 2)

This article continues the deep dive into Apache Spark memory management, explaining storage memory handling—including RDD persistence, caching, eviction, and disk spilling—as well as execution memory allocation for multi-tasking and shuffle operations, and detailing Spark’s internal structures such as BlockManager, StorageLevel, and Tungsten page management.

Apache SparkMemory ManagementRDD Persistence

0 likes · 13 min read

Apache Spark Memory Management: Storage and Execution Memory (Part 2)