Tag

Codegen

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Feb 1, 2025 · Big Data

Spark Native and Cloud Native: Vectorized SQL Engines, Remote Shuffle, and EMR Serverless Spark Practices

This article explains the challenges of big‑data processing in the cloud era, introduces Spark’s native‑language SQL engine rewrites, discusses vectorization and code generation techniques, describes cloud‑native storage‑compute separation with Remote Shuffle services such as Apache Celeborn, and presents the production benefits of Alibaba Cloud’s EMR Serverless Spark.

Big DataCloud NativeCodegen
0 likes · 12 min read
Spark Native and Cloud Native: Vectorized SQL Engines, Remote Shuffle, and EMR Serverless Spark Practices
DataFunSummit
DataFunSummit
Aug 7, 2023 · Big Data

Performance Optimizations in Impala for Data Lake Queries: Iceberg and Codegen Enhancements

This article presents a comprehensive overview of Impala's high‑performance MPP query engine, its architecture for data‑lake workloads, and detailed performance optimizations including Iceberg table format improvements, manifest caching, and various Codegen techniques such as asynchronous compilation and caching.

Big DataCodegenIceberg
0 likes · 17 min read
Performance Optimizations in Impala for Data Lake Queries: Iceberg and Codegen Enhancements