Tagged articles
2 articles
Page 1 of 1
Data Thinking Notes
Data Thinking Notes
Oct 27, 2022 · Big Data

Boost Spark Performance: Proven Code Optimizations & Tuning Tips

This article outlines practical Spark job optimization techniques—from code-level improvements and resource tuning to data skew handling, persistence strategies, shuffle reduction, broadcast variables, Kryo serialization, and efficient data structures—demonstrating how each can dramatically cut execution time.

Big DataKryo SerializationRDD Persistence
0 likes · 19 min read
Boost Spark Performance: Proven Code Optimizations & Tuning Tips
Big Data Technology Architecture
Big Data Technology Architecture
Apr 28, 2019 · Big Data

Apache Spark Memory Management: Storage and Execution Memory (Part 2)

This article continues the deep dive into Apache Spark memory management, explaining storage memory handling—including RDD persistence, caching, eviction, and disk spilling—as well as execution memory allocation for multi-tasking and shuffle operations, and detailing Spark’s internal structures such as BlockManager, StorageLevel, and Tungsten page management.

Apache SparkMemory ManagementRDD Persistence
0 likes · 13 min read
Apache Spark Memory Management: Storage and Execution Memory (Part 2)