Big Data 6 min read

Key Spark Configuration Parameters and Their Explanations

This article presents a comprehensive list of essential Spark configuration settings—including executor memory, off‑heap memory, memory fractions, shuffle options, and adaptive query execution parameters—each accompanied by a concise description to help users fine‑tune Spark performance.

Big Data Technology & Architecture

Dec 23, 2021

Key Spark Configuration Parameters and Their Explanations

This article provides a collection of Spark configuration settings with brief explanations, aimed at helping users optimize Spark applications.

Basic Configuration

spark.executor.memory

Specifies the maximum memory available to each Spark executor.

spark.memory.offHeap.enabled

Toggle for enabling off‑heap memory usage.

spark.memory.offHeap.size

Defines the amount of off‑heap memory to allocate.

spark.memory.fraction

Proportion of JVM heap used for Spark's execution and storage memory.

spark.memory.storageFraction

Fraction of the heap reserved for caching RDDs; the remaining execution memory is 1 - spark.memory.storageFraction.

spark.local.dir

Directory for Spark's temporary files.

spark.cores.max

Maximum number of CPU cores the Spark application can request.

spark.executor.cores

Number of cores allocated per executor.

spark.task.cpus

Number of CPU cores required per task.

spark.default.parallelism

Default level of parallelism for RDD operations.

spark.sql.shuffle.partitions

Number of reducer partitions during shuffle.

Shuffle Configuration

spark.shuffle.file.buffer

Size of the buffer for shuffle write operations before data is flushed to disk.

spark.reducer.maxSizeInFlight

Buffer size for shuffle read tasks, controlling how much data can be fetched at once.

spark.shuffle.sort.bypassMergeThreshold

When using SortShuffleManager, if the number of shuffle read tasks is below this threshold (default 200), Spark skips the merge sort step and writes data directly, later merging temporary files.

Spark SQL Configuration

spark.sql.adaptive.enabled

Toggle for enabling Adaptive Query Execution (AQE).

spark.sql.adaptive.coalescePartitions.enabled

Whether to coalesce small partitions automatically (enabled by default).

spark.sql.adaptive.advisoryPartitionSizeInBytes

Recommended partition size when splitting skewed data or coalescing small partitions.

spark.sql.adaptive.coalescePartitions.minPartitionNum

Minimum number of partitions after coalescing.

spark.sql.adaptive.fetchShuffleBlocksInBatch

Enables batch fetching of shuffle blocks to reduce I/O overhead.

spark.sql.adaptive.skewJoin.enabled

Automatic handling of skewed joins in sort‑merge join operations.

Skew Join Parameters

spark.sql.adaptive.skewJoin.skewedPartitionFactor

Ratio used to determine if a partition is considered skewed.

spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes

Minimum size threshold for a partition to be treated as skewed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Memory Management Spark Shuffle Adaptive Query Execution

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.