DataFunSummit
Feb 1, 2025 · Big Data
Spark Native and Cloud Native: Vectorized SQL Engines, Remote Shuffle, and EMR Serverless Spark Practices
This article explains the challenges of big‑data processing in the cloud era, introduces Spark’s native‑language SQL engine rewrites, discusses vectorization and code generation techniques, describes cloud‑native storage‑compute separation with Remote Shuffle services such as Apache Celeborn, and presents the production benefits of Alibaba Cloud’s EMR Serverless Spark.
Big DataCloud NativeCodegen
0 likes · 12 min read