Tagged articles
19 articles
Page 1 of 1
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 21, 2025 · Backend Development

How Elasticsearch Scales to Billions of Queries: Sharding, Inverted Index, Distributed Execution, and Replication

Elasticsearch achieves billion‑scale search performance by combining horizontal sharding, immutable inverted‑index segments, a two‑stage distributed Query/FETCH model, and multiple replicas with a coordinator node to ensure high concurrency, scalability, and availability.

Distributed QueryElasticsearchReplication
0 likes · 4 min read
How Elasticsearch Scales to Billions of Queries: Sharding, Inverted Index, Distributed Execution, and Replication
DataFunSummit
DataFunSummit
Nov 5, 2025 · Databases

How REDgraph Supercharges Query Performance for Massive Social Networks

This article explains how Xiaohongshu built the REDgraph graph database to tackle ultra‑large social network queries, compares graph databases with traditional relational databases, showcases a Gremlin example, and highlights the scalability and efficiency benefits of storing relationships as first‑class citizens.

Distributed QueryGremlinNoSQL
0 likes · 6 min read
How REDgraph Supercharges Query Performance for Massive Social Networks
Tech Freedom Circle
Tech Freedom Circle
Sep 1, 2025 · Databases

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

This article explains ClickHouse’s core architecture—including its storage‑compute integration, MPP parallelism, columnar storage, vectorized execution, data pre‑sorting, table engines, sparse and auxiliary indexes, and the two‑stage aggregation pipeline—then walks through the exact GROUP BY execution flow for both local and distributed tables, illustrating each step with diagrams, SQL demos, and code snippets.

ClickHouseColumnar StorageDistributed Query
0 likes · 29 min read
How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows
Senior Tony
Senior Tony
Oct 12, 2024 · Backend Development

When Monolith Meets Microservices: API Composition vs CQRS for Complex Queries

This article compares API composition and CQRS patterns for handling distributed queries in evolving monolithic systems, illustrating their workflows with e‑commerce and online‑education examples, discussing performance trade‑offs, implementation details using Canal and ElasticSearch, and offering practical guidance on when to adopt each approach.

API compositionBackend ArchitectureCQRS
0 likes · 8 min read
When Monolith Meets Microservices: API Composition vs CQRS for Complex Queries
dbaplus Community
dbaplus Community
Aug 20, 2024 · Databases

How REDgraph Cut Multi‑Hop Query Latency by 50% with Distributed Parallel Execution

Xiaohongshu's REDgraph graph database faced high latency for multi‑hop queries, so the storage team redesigned the query framework using MPP‑inspired distributed parallel execution, edge‑partitioning, operator forwarding, and caching, achieving over 50% latency reduction and making three‑hop queries viable for online services.

Distributed QueryREDgraphgraph-database
0 likes · 30 min read
How REDgraph Cut Multi‑Hop Query Latency by 50% with Distributed Parallel Execution
MaGe Linux Operations
MaGe Linux Operations
Aug 9, 2024 · Operations

Mastering Elasticsearch Data Sync and Cluster Architecture: Strategies & Best Practices

This article explains how to keep MySQL and Elasticsearch data in sync using synchronous calls, asynchronous notifications, or binlog listeners, and dives deep into Elasticsearch cluster design, node roles, distributed storage, query phases, split‑brain handling, and fault‑tolerance mechanisms.

Cluster ArchitectureDistributed QueryElasticsearch
0 likes · 8 min read
Mastering Elasticsearch Data Sync and Cluster Architecture: Strategies & Best Practices
DataFunTalk
DataFunTalk
Jun 16, 2024 · Databases

Design and Optimization of REDgraph: Distributed Parallel Multi‑hop Query for Large‑Scale Social Graphs

This article presents the design, challenges, and performance‑focused optimizations of REDgraph, a large‑scale graph database used at Xiaohongshu, detailing its architecture, edge‑partitioning strategy, distributed parallel query implementation, and experimental results that demonstrate significant latency reductions for multi‑hop queries.

Distributed QueryREDgraphScalability
0 likes · 25 min read
Design and Optimization of REDgraph: Distributed Parallel Multi‑hop Query for Large‑Scale Social Graphs
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Mar 17, 2023 · Big Data

How Data Federation Transforms Enterprise Data Integration and Analytics

This article explains the concept of data federation, its advantages over traditional ETL, key architectural components, practical use cases such as virtual ODS, data staging, warehouse extension, heterogeneous migration, and compares Presto and Trino as distributed query engines for unified, secure, and low‑cost data access.

Distributed QueryETL alternativePresto
0 likes · 21 min read
How Data Federation Transforms Enterprise Data Integration and Analytics
DataFunTalk
DataFunTalk
Oct 25, 2022 · Databases

Design and Implementation of ByteHouse Query Optimizer

The article explains how ByteHouse extends ClickHouse with a full‑featured query optimizer—including rule‑based and cost‑based techniques, analyzer modules, plan construction, and distributed optimization—to overcome ClickHouse limitations and achieve significant performance gains on complex OLAP workloads.

ByteHouseCBODistributed Query
0 likes · 10 min read
Design and Implementation of ByteHouse Query Optimizer
DataFunSummit
DataFunSummit
Oct 7, 2022 · Databases

Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements

This article explains how ByteHouse (a heavily optimized ClickHouse variant) tackles complex query challenges by introducing a multi‑stage execution model, exchange mechanisms, runtime filters, and network optimizations, and it presents performance results and future directions for large‑scale OLAP workloads.

ByteHouseClickHouseDatabase Optimization
0 likes · 21 min read
Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements
ITPUB
ITPUB
Sep 12, 2022 · Databases

How ByteHouse Transforms ClickHouse for Complex Queries: Multi‑Stage Execution and Real‑World Optimizations

This article explains how ByteHouse, a heavily optimized fork of ClickHouse, introduces a multi‑stage execution model, advanced exchange mechanisms, and runtime filters to overcome the limitations of the original two‑stage query flow, delivering significant performance gains for complex joins, aggregations, and large‑scale analytics workloads.

ByteHouseClickHouseDatabase Engineering
0 likes · 22 min read
How ByteHouse Transforms ClickHouse for Complex Queries: Multi‑Stage Execution and Real‑World Optimizations
DataFunSummit
DataFunSummit
May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseCloud Native
0 likes · 15 min read
Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer
Shopee Tech Team
Shopee Tech Team
Sep 23, 2021 · Big Data

Design and Architecture of the Boussole Real-Time Multi-Dimensional Data Analysis Engine

Boussole is Shopee’s real‑time analytics engine that transforms each dimension into key‑value pairs stored primarily in HBase, pre‑aggregates selected dimension combos, hashes metrics and tags, executes distributed PromQL queries with a CockroachDB‑inspired executor, applies Delta‑of‑Delta compression and point‑capping, and continues to evolve with adaptive pre‑aggregation and new storage models to maintain millisecond latency for massive multi‑dimensional analysis.

Distributed QueryPre-aggregationPromQL
0 likes · 24 min read
Design and Architecture of the Boussole Real-Time Multi-Dimensional Data Analysis Engine
360 Tech Engineering
360 Tech Engineering
Sep 4, 2019 · Big Data

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

Big DataDistributed QuerySQL Federation
0 likes · 14 min read
XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine
ITPUB
ITPUB
Jun 22, 2017 · Databases

Boost Oracle Distributed Queries with the DRIVING_SITE Hint

This article explains how the DRIVING_SITE hint can reduce network traffic in Oracle distributed queries by pushing small tables to the remote site, demonstrates setup steps, compares execution times with and without the hint, and provides concrete PL/SQL scripts for performance testing.

DBLINKDistributed QueryOracle
0 likes · 9 min read
Boost Oracle Distributed Queries with the DRIVING_SITE Hint
ITPUB
ITPUB
Mar 23, 2017 · Databases

How to Slash Distributed Oracle Query Time with the driving_site Hint

This article explains how to use Oracle's driving_site hint to minimize network traffic in distributed DBLINK queries, demonstrates a banking case where execution time drops from over eight seconds to under one second, and provides step‑by‑step view‑based solutions for DML optimization.

Distributed QueryOracleSQL
0 likes · 8 min read
How to Slash Distributed Oracle Query Time with the driving_site Hint
Architecture Digest
Architecture Digest
Mar 25, 2016 · Big Data

Design, Evolution, and Performance Evaluation of the PINGO Distributed Interactive Query Platform

This article details the motivation, architectural iterations, caching strategies, SparkSQL enhancements, and performance benchmarks of Baidu's PINGO platform, illustrating how it transformed from a Hive‑based QueryEngine into a high‑performance, Spark‑driven interactive query system for large‑scale data analysis.

Distributed QueryPINGOPerformance Evaluation
0 likes · 14 min read
Design, Evolution, and Performance Evaluation of the PINGO Distributed Interactive Query Platform
dbaplus Community
dbaplus Community
Feb 21, 2016 · Databases

Boost Oracle Distributed Query Performance with Collocated Inline Views and Hints

This article explains how to optimize Oracle distributed queries that involve remote tables by minimizing remote calls, reducing result set size, and improving execution plans through techniques such as collocated inline views, CBO behavior, and driving_site hints, illustrated with detailed examples and performance measurements.

Collocated ViewDatabase PerformanceDistributed Query
0 likes · 11 min read
Boost Oracle Distributed Query Performance with Collocated Inline Views and Hints