Blaze: Design and Practice of SparkSQL Native Operator Optimization at Kuaishou
This article presents Blaze, a Kuaishou‑built native execution middleware for SparkSQL that leverages Apache DataFusion to achieve vectorized operator execution, detailing its architecture, implementation, performance gains, current coverage, benchmark results, production rollout, and future development plans.
The presentation introduces Blaze, a native execution engine for SparkSQL developed at Kuaishou, aiming to improve Spark performance through both execution‑plan optimization and runtime efficiency enhancements.
It first reviews Spark’s evolution: Spark 1.0’s interpreted execution, Spark 2.0’s WholeStageCodegen (multi‑operator compilation), and Spark 3.0’s Adaptive Query Execution, highlighting the growing importance of vectorized execution.
Several vectorization projects are surveyed, including Meta’s Velox, Intel’s Gluten, Databricks’ Photon, and Alibaba’s Native Codegen, each providing native columnar execution capabilities.
Blaze is described as a Rust‑based middleware built on Apache DataFusion, wrapping Spark’s physical plan and translating supported operators into native DataFusion plans via a Session Extension and JNI bridge.
The overall architecture shows Spark generating a physical plan, Blaze’s extension converting it to a native plan, and DataFusion executing it using Arrow columnar format, while unsupported operators remain in JVM execution.
Four core components are detailed: Blaze Session Extension (operator inspection and conversion), Plan SerDe (protobuf serialization of native plans), JNI Gateways (data and plan transfer), and Native Operators (mapping Spark operators to native implementations).
Physical‑plan conversion is rule‑based to avoid unnecessary row‑to‑column transformations, and native plans are generated and submitted using protobuf descriptors, with native RDDs executing the plans via JNI.
UDF compatibility is achieved by packaging data as Arrow RPC, sending it to the JVM for Java UDF execution, and returning results, minimizing overhead through batch processing.
Memory management tackles native memory limitations by implementing a two‑level spill strategy that first spills to JVM heap via JNI/NIO and then uses Spark’s MemoryManager for further spilling when needed.
Additional optimizations include custom comparison operations using arrow‑row, faster sorting with Rust’s sort_unstable (PDQSort) and tournament‑tree merging, efficient hash maps via hashbrown, and a tailored columnar shuffle format to improve ZSTD compression.
The project contributes back to the open‑source community by reporting enhancements to DataFusion, such as memory management, remote storage APIs, and operator optimizations.
Current progress shows high operator coverage for common workloads, benchmark results on TPC‑DS 1 TB demonstrating up to 10× speedup for individual queries and >2× overall improvement, and production rollout with up to 4× performance gains on selected ETL tasks.
Future work includes expanding operator and data‑type support, large‑scale internal deployment, abstracting interfaces for broader engine compatibility, and continued open‑source contributions.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.