Tagged articles

Distributed Query

19 articles · Page 1 of 1

Dec 21, 2025 · Backend Development

How Elasticsearch Scales to Billions of Queries: Sharding, Inverted Index, Distributed Execution, and Replication

Elasticsearch achieves billion‑scale search performance by combining horizontal sharding, immutable inverted‑index segments, a two‑stage distributed Query/FETCH model, and multiple replicas with a coordinator node to ensure high concurrency, scalability, and availability.

Distributed QueryElasticsearchSharding

0 likes · 4 min read

How Elasticsearch Scales to Billions of Queries: Sharding, Inverted Index, Distributed Execution, and Replication

DataFunSummit

Nov 5, 2025 · Databases

How REDgraph Supercharges Query Performance for Massive Social Networks

This article explains how Xiaohongshu built the REDgraph graph database to tackle ultra‑large social network queries, compares graph databases with traditional relational databases, showcases a Gremlin example, and highlights the scalability and efficiency benefits of storing relationships as first‑class citizens.

Distributed QueryGremlinNoSQL

0 likes · 6 min read

How REDgraph Supercharges Query Performance for Massive Social Networks

Tech Freedom Circle

Sep 1, 2025 · Databases

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

This article explains ClickHouse’s core architecture—including its storage‑compute integration, MPP parallelism, columnar storage, vectorized execution, data pre‑sorting, table engines, sparse and auxiliary indexes, and the two‑stage aggregation pipeline—then walks through the exact GROUP BY execution flow for both local and distributed tables, illustrating each step with diagrams, SQL demos, and code snippets.

ClickHouseColumnar StorageDistributed Query

0 likes · 29 min read

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

Senior Tony

Oct 12, 2024 · Backend Development

When Monolith Meets Microservices: API Composition vs CQRS for Complex Queries

This article compares API composition and CQRS patterns for handling distributed queries in evolving monolithic systems, illustrating their workflows with e‑commerce and online‑education examples, discussing performance trade‑offs, implementation details using Canal and ElasticSearch, and offering practical guidance on when to adopt each approach.

API compositionCQRSCanal

0 likes · 8 min read

When Monolith Meets Microservices: API Composition vs CQRS for Complex Queries

dbaplus Community

Aug 20, 2024 · Databases

How REDgraph Cut Multi‑Hop Query Latency by 50% with Distributed Parallel Execution

Xiaohongshu's REDgraph graph database faced high latency for multi‑hop queries, so the storage team redesigned the query framework using MPP‑inspired distributed parallel execution, edge‑partitioning, operator forwarding, and caching, achieving over 50% latency reduction and making three‑hop queries viable for online services.

Distributed QueryOptimizationREDgraph

0 likes · 30 min read

How REDgraph Cut Multi‑Hop Query Latency by 50% with Distributed Parallel Execution

MaGe Linux Operations

Aug 9, 2024 · Operations

Mastering Elasticsearch Data Sync and Cluster Architecture: Strategies & Best Practices

This article explains how to keep MySQL and Elasticsearch data in sync using synchronous calls, asynchronous notifications, or binlog listeners, and dives deep into Elasticsearch cluster design, node roles, distributed storage, query phases, split‑brain handling, and fault‑tolerance mechanisms.

Cluster ArchitectureData synchronizationDistributed Query

0 likes · 8 min read

Mastering Elasticsearch Data Sync and Cluster Architecture: Strategies & Best Practices

DataFunTalk

Jun 16, 2024 · Databases

Design and Optimization of REDgraph: Distributed Parallel Multi‑hop Query for Large‑Scale Social Graphs

This article presents the design, challenges, and performance‑focused optimizations of REDgraph, a large‑scale graph database used at Xiaohongshu, detailing its architecture, edge‑partitioning strategy, distributed parallel query implementation, and experimental results that demonstrate significant latency reductions for multi‑hop queries.

Distributed QueryREDgraphgraph database

0 likes · 25 min read

Design and Optimization of REDgraph: Distributed Parallel Multi‑hop Query for Large‑Scale Social Graphs

StarRing Big Data Open Lab

Mar 17, 2023 · Big Data

How Data Federation Transforms Enterprise Data Integration and Analytics

This article explains the concept of data federation, its advantages over traditional ETL, key architectural components, practical use cases such as virtual ODS, data staging, warehouse extension, heterogeneous migration, and compares Presto and Trino as distributed query engines for unified, secure, and low‑cost data access.

Distributed QueryETL alternativeTrino

0 likes · 21 min read

How Data Federation Transforms Enterprise Data Integration and Analytics

DataFunTalk

Oct 25, 2022 · Databases

Design and Implementation of ByteHouse Query Optimizer

The article explains how ByteHouse extends ClickHouse with a full‑featured query optimizer—including rule‑based and cost‑based techniques, analyzer modules, plan construction, and distributed optimization—to overcome ClickHouse limitations and achieve significant performance gains on complex OLAP workloads.

ByteHouseCBODistributed Query

0 likes · 10 min read

Design and Implementation of ByteHouse Query Optimizer

DataFunSummit

Oct 7, 2022 · Databases

Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements

This article explains how ByteHouse (a heavily optimized ClickHouse variant) tackles complex query challenges by introducing a multi‑stage execution model, exchange mechanisms, runtime filters, and network optimizations, and it presents performance results and future directions for large‑scale OLAP workloads.

ByteHouseClickHouseDistributed Query

0 likes · 21 min read

Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements

ITPUB

Sep 12, 2022 · Databases

How ByteHouse Transforms ClickHouse for Complex Queries: Multi‑Stage Execution and Real‑World Optimizations

This article explains how ByteHouse, a heavily optimized fork of ClickHouse, introduces a multi‑stage execution model, advanced exchange mechanisms, and runtime filters to overcome the limitations of the original two‑stage query flow, delivering significant performance gains for complex joins, aggregations, and large‑scale analytics workloads.

ByteHouseClickHouseDatabase Engineering

0 likes · 22 min read

How ByteHouse Transforms ClickHouse for Complex Queries: Multi‑Stage Execution and Real‑World Optimizations

DataFunSummit

May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseCloud Native

0 likes · 15 min read

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

Shopee Tech Team

Sep 23, 2021 · Big Data

Design and Architecture of the Boussole Real-Time Multi-Dimensional Data Analysis Engine

Boussole is Shopee’s real‑time analytics engine that transforms each dimension into key‑value pairs stored primarily in HBase, pre‑aggregates selected dimension combos, hashes metrics and tags, executes distributed PromQL queries with a CockroachDB‑inspired executor, applies Delta‑of‑Delta compression and point‑capping, and continues to evolve with adaptive pre‑aggregation and new storage models to maintain millisecond latency for massive multi‑dimensional analysis.

Distributed QueryPre-aggregationPromQL

0 likes · 24 min read

Design and Architecture of the Boussole Real-Time Multi-Dimensional Data Analysis Engine

Aikesheng Open Source Community

Jan 13, 2020 · Databases

Using Explain to Analyze and Optimize Distributed SQL Queries in DBLE

This article demonstrates how to use the Explain feature in DBLE to visualize distributed query plans, compare join orders across multiple SQL statements, identify Cartesian products and ordered joins, and apply optimization insights for better performance in multi‑table queries.

DBLEDistributed QueryEXPLAIN

0 likes · 6 min read

Using Explain to Analyze and Optimize Distributed SQL Queries in DBLE

360 Tech Engineering

Sep 4, 2019 · Big Data

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

Big DataDistributed QuerySQL Federation

0 likes · 14 min read

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

ITPUB

Jun 22, 2017 · Databases

Boost Oracle Distributed Queries with the DRIVING_SITE Hint

This article explains how the DRIVING_SITE hint can reduce network traffic in Oracle distributed queries by pushing small tables to the remote site, demonstrates setup steps, compares execution times with and without the hint, and provides concrete PL/SQL scripts for performance testing.

DBLINKDistributed QueryOracle

0 likes · 9 min read

Boost Oracle Distributed Queries with the DRIVING_SITE Hint

ITPUB

Mar 23, 2017 · Databases

How to Slash Distributed Oracle Query Time with the driving_site Hint

This article explains how to use Oracle's driving_site hint to minimize network traffic in distributed DBLINK queries, demonstrates a banking case where execution time drops from over eight seconds to under one second, and provides step‑by‑step view‑based solutions for DML optimization.

DatabasesDistributed QueryOracle

0 likes · 8 min read

How to Slash Distributed Oracle Query Time with the driving_site Hint

Architecture Digest

Mar 25, 2016 · Big Data

Design, Evolution, and Performance Evaluation of the PINGO Distributed Interactive Query Platform

This article details the motivation, architectural iterations, caching strategies, SparkSQL enhancements, and performance benchmarks of Baidu's PINGO platform, illustrating how it transformed from a Hive‑based QueryEngine into a high‑performance, Spark‑driven interactive query system for large‑scale data analysis.

CachingDistributed QueryPINGO

0 likes · 14 min read

Design, Evolution, and Performance Evaluation of the PINGO Distributed Interactive Query Platform

dbaplus Community

Feb 21, 2016 · Databases

Boost Oracle Distributed Query Performance with Collocated Inline Views and Hints

This article explains how to optimize Oracle distributed queries that involve remote tables by minimizing remote calls, reducing result set size, and improving execution plans through techniques such as collocated inline views, CBO behavior, and driving_site hints, illustrated with detailed examples and performance measurements.

Collocated ViewDatabase PerformanceDistributed Query

0 likes · 11 min read

Boost Oracle Distributed Query Performance with Collocated Inline Views and Hints