Tag

Apache Arrow

0 views collected around this technical thread.

360 Tech Engineering
360 Tech Engineering
Oct 17, 2024 · Databases

Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow

This article explains DataFusion, a Rust‑written, Arrow‑based query engine that offers high performance, extensibility, and seamless integration with various data sources, detailing its architecture, execution model, Rust advantages, and practical usage examples for building modern data‑warehouse solutions.

Apache ArrowData WarehouseDataFusion
0 likes · 15 min read
Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 9, 2024 · Big Data

Why DataFusion is Revolutionizing Big Data Queries with Rust and Arrow

This article introduces DataFusion, a high‑performance, Rust‑based query engine that leverages Apache Arrow’s columnar memory format to enable fast, extensible data processing across multiple storage formats and cloud sources, explains its architecture, execution model, and provides practical Rust code examples for custom extensions.

Apache ArrowBig DataDataFusion
0 likes · 16 min read
Why DataFusion is Revolutionizing Big Data Queries with Rust and Arrow
Python Programming Learning Circle
Python Programming Learning Circle
Aug 13, 2024 · Big Data

What’s New in pandas 2.0: Arrow Backend, Copy‑On‑Write, and Performance Improvements

The article reviews pandas 2.0’s major upgrades—including an Apache Arrow backend that speeds up CSV reads by over 30×, new Arrow dtypes, a nullable‑numpy dtype for missing values, a copy‑on‑write memory model, optional dependencies, and benchmark comparisons with ydata‑profiling—highlighting the library’s enhanced performance, flexibility, and interoperability for data‑intensive Python workflows.

Apache ArrowPerformancePython
0 likes · 15 min read
What’s New in pandas 2.0: Arrow Backend, Copy‑On‑Write, and Performance Improvements
DataFunSummit
DataFunSummit
Jun 21, 2024 · Big Data

Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow and its columnar in‑memory format, describes dynamic read‑time modeling, outlines the system’s execution flow, storage and indexing strategies, and shares practical tips and extensions for building scalable big‑data solutions.

AceroApache ArrowBig Data
0 likes · 20 min read
Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips
DataFunSummit
DataFunSummit
Apr 23, 2024 · Big Data

Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow’s columnar in‑memory format and its zero‑copy advantages, describes how to model data at read time, outlines the execution flow with Acero and SQL planning, and shares practical tips and extensions for building robust, dynamic‑schema data platforms.

AceroApache ArrowBig Data
0 likes · 20 min read
Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips
Sohu Tech Products
Sohu Tech Products
Mar 6, 2024 · Big Data

Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution

The article explains how Apache Arrow’s columnar, cross‑language in‑memory format enables high‑performance, interoperable data systems—replacing traditional row‑oriented databases—by supporting dynamic schemas, zero‑copy data exchange, efficient indexing, Acero‑based query execution, and Flight/ADBC connectivity, while offering practical guidance and highlighting challenges.

Apache ArrowBig DataData Systems
0 likes · 20 min read
Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution
DataFunTalk
DataFunTalk
Feb 28, 2024 · Big Data

Building a Data System with Apache Arrow: Design, Modeling, and Execution

This article explains why new data systems are needed, introduces Apache Arrow and its columnar in‑memory format, describes read‑time modeling and dynamic schema handling, and shows how Arrow can be used to build a complete data processing pipeline with indexing, SQL planning, and zero‑copy data exchange.

Apache ArrowBig DataData Systems
0 likes · 20 min read
Building a Data System with Apache Arrow: Design, Modeling, and Execution
Sohu Tech Products
Sohu Tech Products
Jan 24, 2024 · Databases

Optimizing Database Expression Evaluation with JIT Technology Using Gandiva

The article explains how database expression evaluation—especially in WHERE and SELECT clauses—can be dramatically accelerated by replacing interpreted AST traversal with Just‑In‑Time compilation using Apache Gandiva, which leverages LLVM to generate SIMD‑optimized machine code for Arrow columnar data, and discusses extensions such as timestamp, array, higher‑order functions, and UDF support.

Apache ArrowApache GandivaDatabase Optimization
0 likes · 17 min read
Optimizing Database Expression Evaluation with JIT Technology Using Gandiva
DataFunTalk
DataFunTalk
Jan 15, 2024 · Databases

Optimizing Database Expression Evaluation with JIT Compilation Using Gandiva

This article explains how Just‑In‑Time (JIT) compilation, particularly via the Gandiva expression compiler built on LLVM and Apache Arrow, can dramatically accelerate database expression evaluation by transforming abstract syntax trees into native vectorized code, addressing traditional interpretation bottlenecks and improving CPU‑bound query performance.

Apache ArrowExpression EvaluationGandiva
0 likes · 17 min read
Optimizing Database Expression Evaluation with JIT Compilation Using Gandiva
DataFunTalk
DataFunTalk
Dec 8, 2023 · Databases

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

The article presents an interview with Wu Li, a research engineer at Shanghai Yanhuang Data, discussing how hardware limits have driven database evolution toward columnar storage, the adoption of Apache Arrow and Gandiva for SIMD‑enabled JIT compilation, and the shift from pull to push processing modes to improve OLAP performance.

Apache ArrowDatabase OptimizationGandiva
0 likes · 10 min read
Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode
DataFunSummit
DataFunSummit
Oct 24, 2023 · Big Data

Using Apache Arrow to Quickly Build Modern Data Systems

This announcement introduces Li Chenxi, a big‑data R&D engineer, and outlines his talk on leveraging Apache Arrow’s columnar in‑memory format to efficiently construct modern, read‑time modeling data systems, highlighting key features, ecosystem, and practical implementation benefits for the audience.

Apache ArrowBig DataColumnar Memory
0 likes · 2 min read
Using Apache Arrow to Quickly Build Modern Data Systems
DataFunSummit
DataFunSummit
Nov 20, 2021 · Artificial Intelligence

Design Dimensions of Next‑Generation AI Platforms: Programming Languages, Runtime Environments, and Model Deployment

The article examines three key design dimensions of modern AI platforms—choice of programming language, runtime environment isolation, and model deployment—highlighting how Python’s dominance, container‑based resource management, and efficient data sharing shape platform architecture and performance.

AI platformsApache ArrowKubernetes
0 likes · 13 min read
Design Dimensions of Next‑Generation AI Platforms: Programming Languages, Runtime Environments, and Model Deployment
Big Data Technology Architecture
Big Data Technology Architecture
Aug 8, 2020 · Big Data

Performance Comparison of SparkR with Vectorized Execution Using Apache Arrow

This article explains how SparkR’s performance compares to native Spark APIs, shows the slowdown caused by JVM‑R serialization, and demonstrates how enabling Apache Arrow’s vectorized execution in Spark 3.0 can accelerate SparkR operations by up to dozens of times.

Apache ArrowBig DataPerformance
0 likes · 7 min read
Performance Comparison of SparkR with Vectorized Execution Using Apache Arrow
Laravel Tech Community
Laravel Tech Community
Aug 1, 2020 · Big Data

Apache Arrow 1.0.0 Released with New Columnar Format Features

Apache Arrow 1.0.0, the 18th major release, introduces binary‑stable columnar format changes, new metadata version V5, unsigned dictionary indices, a Feature enum, optional LZ4/ZStandard compression, expanded decimal bitWidth support, removal of validity bitmaps, and broader language bindings, enhancing big‑data analytics performance.

Apache ArrowBig DataRelease Notes
0 likes · 3 min read
Apache Arrow 1.0.0 Released with New Columnar Format Features