Key Spark Configuration Parameters and Their Explanations
This article presents a comprehensive list of essential Spark configuration settings—including executor memory, off‑heap memory, memory fractions, shuffle options, and adaptive query execution parameters—each accompanied by a concise description to help users fine‑tune Spark performance.
This article provides a collection of Spark configuration settings with brief explanations, aimed at helping users optimize Spark applications.
Basic Configuration
spark.executor.memory
Specifies the maximum memory available to each Spark executor.
spark.memory.offHeap.enabled
Toggle for enabling off‑heap memory usage.
spark.memory.offHeap.size
Defines the amount of off‑heap memory to allocate.
spark.memory.fraction
Proportion of JVM heap used for Spark's execution and storage memory.
spark.memory.storageFraction
Fraction of the heap reserved for caching RDDs; the remaining execution memory is 1 - spark.memory.storageFraction.
spark.local.dir
Directory for Spark's temporary files.
spark.cores.max
Maximum number of CPU cores the Spark application can request.
spark.executor.cores
Number of cores allocated per executor.
spark.task.cpus
Number of CPU cores required per task.
spark.default.parallelism
Default level of parallelism for RDD operations.
spark.sql.shuffle.partitions
Number of reducer partitions during shuffle.
Shuffle Configuration
spark.shuffle.file.buffer
Size of the buffer for shuffle write operations before data is flushed to disk.
spark.reducer.maxSizeInFlight
Buffer size for shuffle read tasks, controlling how much data can be fetched at once.
spark.shuffle.sort.bypassMergeThreshold
When using SortShuffleManager, if the number of shuffle read tasks is below this threshold (default 200), Spark skips the merge sort step and writes data directly, later merging temporary files.
Spark SQL Configuration
spark.sql.adaptive.enabled
Toggle for enabling Adaptive Query Execution (AQE).
spark.sql.adaptive.coalescePartitions.enabled
Whether to coalesce small partitions automatically (enabled by default).
spark.sql.adaptive.advisoryPartitionSizeInBytes
Recommended partition size when splitting skewed data or coalescing small partitions.
spark.sql.adaptive.coalescePartitions.minPartitionNum
Minimum number of partitions after coalescing.
spark.sql.adaptive.fetchShuffleBlocksInBatch
Enables batch fetching of shuffle blocks to reduce I/O overhead.
spark.sql.adaptive.skewJoin.enabled
Automatic handling of skewed joins in sort‑merge join operations.
Skew Join Parameters
spark.sql.adaptive.skewJoin.skewedPartitionFactor
Ratio used to determine if a partition is considered skewed.
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes
Minimum size threshold for a partition to be treated as skewed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
