Tag

Distributed Query

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jun 16, 2024 · Databases

Design and Optimization of REDgraph: Distributed Parallel Multi‑hop Query for Large‑Scale Social Graphs

This article presents the design, challenges, and performance‑focused optimizations of REDgraph, a large‑scale graph database used at Xiaohongshu, detailing its architecture, edge‑partitioning strategy, distributed parallel query implementation, and experimental results that demonstrate significant latency reductions for multi‑hop queries.

Distributed QueryREDgraphgraph database
0 likes · 25 min read
Design and Optimization of REDgraph: Distributed Parallel Multi‑hop Query for Large‑Scale Social Graphs
DataFunTalk
DataFunTalk
Oct 25, 2022 · Databases

Design and Implementation of ByteHouse Query Optimizer

The article explains how ByteHouse extends ClickHouse with a full‑featured query optimizer—including rule‑based and cost‑based techniques, analyzer modules, plan construction, and distributed optimization—to overcome ClickHouse limitations and achieve significant performance gains on complex OLAP workloads.

ByteHouseCBODistributed Query
0 likes · 10 min read
Design and Implementation of ByteHouse Query Optimizer
DataFunSummit
DataFunSummit
Oct 7, 2022 · Databases

Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements

This article explains how ByteHouse (a heavily optimized ClickHouse variant) tackles complex query challenges by introducing a multi‑stage execution model, exchange mechanisms, runtime filters, and network optimizations, and it presents performance results and future directions for large‑scale OLAP workloads.

ByteHouseClickHouseDatabase Optimization
0 likes · 21 min read
Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements
DataFunTalk
DataFunTalk
Sep 5, 2022 · Databases

Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Runtime Filters

This article explains how ByteHouse, a heavily optimized ClickHouse variant, addresses complex query challenges by introducing a multi‑stage execution model, sophisticated exchange management, various join strategies, runtime filters, and diagnostic metrics to improve performance, scalability, and resource utilization in large‑scale data environments.

ByteHouseClickHouseDistributed Query
0 likes · 21 min read
Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Runtime Filters
DataFunSummit
DataFunSummit
May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseData Warehouse
0 likes · 15 min read
Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer
Shopee Tech Team
Shopee Tech Team
Sep 23, 2021 · Big Data

Design and Architecture of the Boussole Real-Time Multi-Dimensional Data Analysis Engine

Boussole is Shopee’s real‑time analytics engine that transforms each dimension into key‑value pairs stored primarily in HBase, pre‑aggregates selected dimension combos, hashes metrics and tags, executes distributed PromQL queries with a CockroachDB‑inspired executor, applies Delta‑of‑Delta compression and point‑capping, and continues to evolve with adaptive pre‑aggregation and new storage models to maintain millisecond latency for massive multi‑dimensional analysis.

Big DataDistributed QueryPre-aggregation
0 likes · 24 min read
Design and Architecture of the Boussole Real-Time Multi-Dimensional Data Analysis Engine
Aikesheng Open Source Community
Aikesheng Open Source Community
Jan 13, 2020 · Databases

Using Explain to Analyze and Optimize Distributed SQL Queries in DBLE

This article demonstrates how to use the Explain feature in DBLE to visualize distributed query plans, compare join orders across multiple SQL statements, identify Cartesian products and ordered joins, and apply optimization insights for better performance in multi‑table queries.

DBLEDistributed QueryEXPLAIN
0 likes · 6 min read
Using Explain to Analyze and Optimize Distributed SQL Queries in DBLE
360 Tech Engineering
360 Tech Engineering
Sep 4, 2019 · Big Data

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

Big DataDistributed QueryOpen-source
0 likes · 14 min read
XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine
Architecture Digest
Architecture Digest
Mar 25, 2016 · Big Data

Design, Evolution, and Performance Evaluation of the PINGO Distributed Interactive Query Platform

This article details the motivation, architectural iterations, caching strategies, SparkSQL enhancements, and performance benchmarks of Baidu's PINGO platform, illustrating how it transformed from a Hive‑based QueryEngine into a high‑performance, Spark‑driven interactive query system for large‑scale data analysis.

Big DataDistributed QueryPINGO
0 likes · 14 min read
Design, Evolution, and Performance Evaluation of the PINGO Distributed Interactive Query Platform