Tag

DataFusion

0 views collected around this technical thread.

360 Tech Engineering
360 Tech Engineering
Oct 17, 2024 · Databases

Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow

This article explains DataFusion, a Rust‑written, Arrow‑based query engine that offers high performance, extensibility, and seamless integration with various data sources, detailing its architecture, execution model, Rust advantages, and practical usage examples for building modern data‑warehouse solutions.

Apache ArrowData WarehouseDataFusion
0 likes · 15 min read
Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow
Kuaishou Tech
Kuaishou Tech
Sep 13, 2024 · Big Data

Blaze: Kuaishou’s Rust‑Based Vectorized Execution Engine for Spark SQL

Blaze is a Rust‑implemented, DataFusion‑based vectorized execution engine created by Kuaishou to accelerate Spark SQL queries, delivering up to 60% faster computation, 30% average compute‑power gains in production, and extensive architectural innovations such as native engine, protobuf protocol, JNI bridge, and Spark extension, while being open‑source and compatible with Spark 3.0‑3.5.

Big DataDataFusionRust
0 likes · 11 min read
Blaze: Kuaishou’s Rust‑Based Vectorized Execution Engine for Spark SQL
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 9, 2024 · Big Data

Why DataFusion is Revolutionizing Big Data Queries with Rust and Arrow

This article introduces DataFusion, a high‑performance, Rust‑based query engine that leverages Apache Arrow’s columnar memory format to enable fast, extensible data processing across multiple storage formats and cloud sources, explains its architecture, execution model, and provides practical Rust code examples for custom extensions.

Apache ArrowBig DataDataFusion
0 likes · 16 min read
Why DataFusion is Revolutionizing Big Data Queries with Rust and Arrow
DataFunSummit
DataFunSummit
Jun 21, 2024 · Big Data

Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow and its columnar in‑memory format, describes dynamic read‑time modeling, outlines the system’s execution flow, storage and indexing strategies, and shares practical tips and extensions for building scalable big‑data solutions.

AceroApache ArrowBig Data
0 likes · 20 min read
Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips
DataFunSummit
DataFunSummit
Apr 23, 2024 · Big Data

Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow’s columnar in‑memory format and its zero‑copy advantages, describes how to model data at read time, outlines the execution flow with Acero and SQL planning, and shares practical tips and extensions for building robust, dynamic‑schema data platforms.

AceroApache ArrowBig Data
0 likes · 20 min read
Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips
DataFunSummit
DataFunSummit
May 21, 2023 · Big Data

Blaze: Design and Practice of SparkSQL Native Operator Optimization at Kuaishou

This article presents Blaze, a Kuaishou‑built native execution middleware for SparkSQL that leverages Apache DataFusion to achieve vectorized operator execution, detailing its architecture, implementation, performance gains, current coverage, benchmark results, production rollout, and future development plans.

Big DataDataFusionNative Execution
0 likes · 17 min read
Blaze: Design and Practice of SparkSQL Native Operator Optimization at Kuaishou