Tagged articles
18 articles
Page 1 of 1
360 Tech Engineering
360 Tech Engineering
Oct 17, 2024 · Databases

Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow

This article explains DataFusion, a Rust‑written, Arrow‑based query engine that offers high performance, extensibility, and seamless integration with various data sources, detailing its architecture, execution model, Rust advantages, and practical usage examples for building modern data‑warehouse solutions.

Apache ArrowData WarehouseDataFusion
0 likes · 15 min read
Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 9, 2024 · Big Data

Why DataFusion is Revolutionizing Big Data Queries with Rust and Arrow

This article introduces DataFusion, a high‑performance, Rust‑based query engine that leverages Apache Arrow’s columnar memory format to enable fast, extensible data processing across multiple storage formats and cloud sources, explains its architecture, execution model, and provides practical Rust code examples for custom extensions.

Apache ArrowBig DataDataFusion
0 likes · 16 min read
Why DataFusion is Revolutionizing Big Data Queries with Rust and Arrow
Sohu Tech Products
Sohu Tech Products
Mar 6, 2024 · Big Data

Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution

The article explains how Apache Arrow’s columnar, cross‑language in‑memory format enables high‑performance, interoperable data systems—replacing traditional row‑oriented databases—by supporting dynamic schemas, zero‑copy data exchange, efficient indexing, Acero‑based query execution, and Flight/ADBC connectivity, while offering practical guidance and highlighting challenges.

Apache ArrowBig DataColumnar Storage
0 likes · 20 min read
Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution
DataFunSummit
DataFunSummit
Aug 20, 2023 · Big Data

Kuaishou Data Service System: Modeling, Architecture, and Future Directions

This article presents Kuaishou's comprehensive data service system, covering its domain modeling, evolution from custom to unified services, the Octo query engine and data preparation platform architecture, the dual data API and analysis services, and future plans for intelligence and serverless high‑performance capabilities.

Big DataData PlatformData Service
0 likes · 16 min read
Kuaishou Data Service System: Modeling, Architecture, and Future Directions
JD Tech
JD Tech
Jan 13, 2023 · Big Data

UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements

This article introduces the UData platform, explains its data‑integration architecture, details the StarRocks‑based query engine workflow from SQL parsing to distributed execution, and describes recent optimizations such as computation push‑down, support for JSF/HTTP/ClickHouse external tables, and a proxy‑based federated query framework.

Big DataData IntegrationQuery Engine
0 likes · 20 min read
UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 17, 2021 · Big Data

Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications

This article provides an in‑depth overview of Presto, covering its history, core principles, architectural components, query optimization techniques, resource management, tuning tips, data model, and case studies from companies like Didi and Youzan, offering practical guidance for deploying and operating the distributed SQL engine at scale.

PrestoQuery EngineResource Management
0 likes · 33 min read
Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications
JD Retail Technology
JD Retail Technology
Apr 15, 2021 · Backend Development

How We Scaled JD’s UGC Platform with Elasticsearch: A Backend Architecture Deep Dive

This case study details how JD’s "Browse" UGC project evolved from rapid agile delivery to a performance bottleneck as data grew, and how introducing Elasticsearch, redesigning the query flow, and refactoring storage components restored fast, flexible searches for both front‑end and operations users.

Backend ArchitectureElasticsearchJD UGC Platform
0 likes · 9 min read
How We Scaled JD’s UGC Platform with Elasticsearch: A Backend Architecture Deep Dive
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 7, 2020 · Big Data

A Unified View of SQL‑on‑Hadoop Systems: Architecture, Execution Plans, Optimizations, and Storage Formats

The article provides a comprehensive overview of SQL‑on‑Hadoop query engines such as Hive, Impala, Presto and Spark SQL, comparing their runtime frameworks, core components, compilation steps, optimizer strategies, CPU/IO efficiency techniques, storage formats like ORC and Parquet, and resource management in a unified perspective.

Big DataQuery EngineSQL on Hadoop
0 likes · 24 min read
A Unified View of SQL‑on‑Hadoop Systems: Architecture, Execution Plans, Optimizations, and Storage Formats
Ctrip Technology
Ctrip Technology
Jul 3, 2018 · Big Data

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

This article details Ctrip's experience with the Presto distributed SQL engine, outlining the initial performance and stability issues, the comprehensive enhancements made in security, resource control, compatibility, and monitoring, and the multi‑stage upgrade plan that guides its future evolution.

Big DataKerberosPerformance Optimization
0 likes · 11 min read
Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap
Hulu Beijing
Hulu Beijing
Feb 28, 2018 · Big Data

How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

This article introduces Hulu's in‑house OLAP engine Nesto, detailing its near‑real‑time data ingestion, nested data model, TB‑level storage using HBase and Parquet, MPP query execution, custom predicate library, and the overall architecture that enables sub‑second ad‑hoc queries for user analytics.

Big DataColumnar StorageDistributed Systems
0 likes · 22 min read
How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

Architectural Overview and Optimization Techniques for SQL‑on‑Hadoop Systems

This article provides a comprehensive analysis of SQL‑on‑Hadoop architectures, comparing runtime‑framework‑based engines like Hive with MPP‑style engines such as Impala, detailing core components, compilation pipelines, optimizer strategies, CPU/IO performance tricks, columnar storage formats, and resource management in modern big‑data query platforms.

Columnar StorageQuery EngineSQL on Hadoop
0 likes · 22 min read
Architectural Overview and Optimization Techniques for SQL‑on‑Hadoop Systems

Non‑Intrusive High‑Performance Complex Query Engine for HBase Using Secondary Multi‑Column Indexes

This article presents a non‑intrusive, high‑performance engine that adds secondary multi‑column indexes to Apache HBase, enabling efficient complex condition queries while preserving HBase's scalability, and details its principles, architecture, query API, index configuration, and practical trade‑offs.

CoprocessorHBaseNoSQL
0 likes · 18 min read
Non‑Intrusive High‑Performance Complex Query Engine for HBase Using Secondary Multi‑Column Indexes