Overview of Spark SQL Adaptive Execution Optimization Engine
This article explains Spark SQL's Adaptive Execution engine, covering its background, dynamic plan adjustments, shuffle partition tuning, data skew mitigation techniques, and the key configuration parameters needed to enable and fine‑tune adaptive query execution for improved performance.
Spark SQL Adaptive Execution (also known as Adaptive Query Execution, AQE) can dynamically optimize the execution plan based on runtime statistics, thereby improving overall query efficiency.
Adaptive Execution was first experimented in Spark 2.3 and officially released in Spark 3.0.
The core ideas are:
Execution plans can be adjusted dynamically at runtime.
Adjustments are driven by precise intermediate statistics collected during execution.
Shuffle partition count can be set via the spark.sql.shuffle.partition parameter (default 200). Too few partitions may cause large tasks, memory pressure, and OOM; too many partitions increase scheduling overhead and temporary files.
The Catalyst optimizer selects the best execution plan using a rule‑based optimizer (RBO) and, since Spark 2.2, a cost‑based optimizer (CBO). Once a plan is chosen, it remains static unless AQE provides runtime information to refine it.
Data skew occurs when a partition holds far more data than others, slowing the whole job. Common mitigation methods include increasing shuffle partitions, raising the BroadcastHashJoin threshold to convert SortMergeJoin to BroadcastHashJoin, and manually filtering or prefixing skewed keys.
Adaptive execution architecture follows the flow: sql → parse → logical plan → physical plan → RDD → job → DAG → stage → task run. While a static plan cannot be changed, AQE can modify join strategies and partitioning at runtime.
Adaptive partitioning criteria are based on per‑reducer memory size (default 64 MB) or row count (default 100 000 rows).
Dynamic plan adjustment can switch a SortMergeJoin to a BroadcastHashJoin when the small table fits the broadcast threshold, reducing network shuffle and mitigating skew.
Dynamic skew handling detects skewed partitions during stage execution by comparing each partition’s size or row count to the median multiplied by a configurable factor and to absolute thresholds.
Key configuration parameters (set in org.apache.spark.sql.internal.SQLConf) include: spark.sql.adaptive.enabled=true – enable AQE. spark.sql.adaptive.shuffle.targetPostShuffleInputSize – target size per reducer (default 64 MB). spark.sql.adaptive.minNumPostShufflePartitions – (removed in Spark 3.0) former partition count threshold. spark.sql.adaptive.forceApply – force AQE even when no shuffle or sub‑query is present. spark.sql.adaptive.logLevel – log level for AQE. spark.sql.adaptive.advisoryPartitionSizeInBytes – recommended partition size (same meaning as targetPostShuffleInputSize). spark.sql.adaptive.coalescePartitions.enabled – enable merging of small partitions. spark.sql.adaptive.coalescePartitions.minPartitionNum – minimum number of partitions after merging. spark.sql.adaptive.fetchShuffleBlocksInBatch – fetch shuffle blocks in batches to reduce I/O. spark.sql.adaptive.skewJoin.enabled – enable automatic skew handling for sort‑merge joins. spark.sql.adaptive.skewJoin.skewedPartitionFactor – factor to identify skewed partitions (default 10). spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes – size threshold for a partition to be considered skewed (default 256 MB).
References:
https://issues.apache.org/jira/browse/SPARK-23128
https://blog.csdn.net/weixin_34006468/article/details/91894261
https://www.cnblogs.com/zz-ksw/p/11254294.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
