Why ClickHouse Outperforms Other Databases: Core Features and Architecture Explained
This article explains ClickHouse’s MPP columnar design, complete DBMS capabilities, columnar storage, vectorized execution, multi‑master architecture, real‑time queries, sharding, and performance‑focused hardware and algorithm choices that together deliver its superior speed.
Core Architecture
ClickHouse is an MPP (Massively Parallel Processing) column‑oriented DBMS. Each node works on its own local resources (share‑nothing) and communicates over the network, giving linear scalability and eliminating single points of failure. The cluster follows a multi‑master design: any node can accept client connections and serve the same queries.
DBMS Capabilities
DDL: create/alter/drop databases, tables, and views without restarting.
DML: INSERT, SELECT, UPDATE, DELETE with full SQL support.
Fine‑grained permission control.
Backup/restore utilities.
Automatic cluster management (distributed mode).
Columnar Storage & Compression
Data is stored by column, so queries read only the required columns. This reduces I/O dramatically for wide tables and enables aggressive compression (e.g., LZ4, ZSTD), further speeding up scans.
Vectorized Execution Engine
ClickHouse exploits CPU SIMD instructions to process many values per instruction and runs multiple threads per query. Vectorized functions operate on whole columns rather than row‑by‑row, delivering order‑of‑magnitude speedups for text transformation, filtering, JSON handling, and compression.
Hardware‑Centric Optimizations
GROUP BY is performed in memory using hash tables that fit into the L3 cache.
Algorithm selection is data‑size aware: small sets use arrays, medium sets use hash sets, massive sets use HyperLogLog for distinct counting.
String search uses the most suitable algorithm:
Constant‑pattern search → Volnitsky.
Variable‑pattern search → SIMD‑based brute‑force.
Regex → RE2 or Hyperscan.
Table Engines, Sharding & Distributed Queries
Storage is abstracted behind IStorage implementations (table engines). Common engines include MergeTree, TinyLog, and system engines. Two logical table types are used:
Local table : stores data on a single shard (one physical server).
Distributed table : contains no data itself; it proxies one or more local tables, enabling transparent sharding and parallel query execution across the cluster.
Each shard has one replica; adding shards scales out horizontally. A cluster with N nodes can host N shards (one shard per node) and any number of replicas for redundancy.
Query Processing Pipeline
Client sends SQL via HTTP, TCP, or the data‑copy interface. Parser builds an abstract syntax tree (AST) from the query. Interceptor interprets the AST and creates an execution pipeline. IStorage fetches raw column data according to the AST.
Data is represented as Block objects, each a triple of Column, DataType, and column name.
Vectorized functions and aggregations run on these blocks, producing results that are sent back to the client.
Functions
Two families of functions are defined:
Ordinary functions (e.g., formatDateTime, substring) are stateless and applied column‑wise via SIMD.
Aggregate functions (e.g., uniqCombined) maintain state that can be serialized and transferred between nodes for distributed aggregation. The implementation chooses the optimal algorithm based on cardinality: array → hash set → HyperLogLog.
Server Interfaces
HTTP endpoint for any external client.
TCP endpoint for the native ClickHouse client and inter‑node communication.
Data‑copy interface for bulk data transfer.
Performance Validation
Yandex runs continuous integration tests on production‑scale datasets. Real‑world workloads are replayed to verify that algorithmic and hardware optimizations deliver the expected throughput (e.g., >1.7 × 10⁸ rows/second scan rates).
Key Takeaways
Bottom‑up, hardware‑aware design (maximizing cache usage, SIMD, appropriate algorithms) yields sub‑second response times even for complex analytical queries.
Modular table‑engine architecture lets users pick the engine that best matches their workload.
Multi‑master, share‑nothing clustering provides fault tolerance and easy horizontal scaling.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
