Why ClickHouse Powers JD Cloud’s Billion‑Row Queries: Architecture and Performance Secrets
This article explains how JD Cloud’s JCHDB, built on ClickHouse, achieves millisecond‑level queries on billions of rows through columnar storage, distributed multi‑master architecture, SIMD vector engine, sparse indexing, and specialized table engines, and outlines the ideal use cases and deployment details.
Recently, Wang Xiangfei, an architect from JD Cloud's product R&D department, delivered an online lecture titled “ClickHouse in JD Cloud: Large‑Scale Application and Architectural Improvements” , sharing the practical deployment and optimization experience of ClickHouse at JD Cloud.
The ClickHouse‑based analytical cloud database JCHDB is now publicly available; users can enable a trial via the JD Cloud console.
JCHDB is an OLAP service built on ClickHouse with a distributed architecture, enabling parallel queries across multiple cores and nodes. Its query performance is 10‑100 times faster than traditional open‑source databases, fully meeting large‑scale business analytics needs.
ClickHouse, an open‑source analytical database from Yandex, demonstrates impressive performance, as shown by benchmark results where ClickHouse outperforms MySQL by 839×, Greenplum by 24×, and Vertica by 5× on 100 million‑row queries.
MySQL: 839× slower Greenplum: 24× slower Vertica: 5× slower
Key reasons for ClickHouse’s speed include:
1. Columnar Storage and Efficient Compression
ClickHouse stores data column‑wise, reducing I/O during queries and enabling high compression (default LZ4, up to 8:1). Optimized block sizes balance CPU usage for compression/decompression.
2. Distributed Multi‑Master Architecture
Read requests are distributed across any node, balancing load, while writes avoid a single master, eliminating bottlenecks. Sharding and partitioning evenly spread data, enhancing parallel query processing.
3. Vector Engine with SIMD Instructions
The vector engine leverages CPU SIMD to execute operations on multiple data blocks simultaneously, dramatically cutting instruction counts and accelerating computation, especially when combined with multi‑core processing.
4. Sparse and Skip‑Indexing
ClickHouse uses sparse indexes sampled at fixed intervals (default 8192 rows), drastically reducing index size and allowing in‑memory loading. Optional secondary skip indexes further speed up primary key lookups, though they are less suited for point queries.
5. Rich Table Engines for Diverse Scenarios
Specialized engines such as ReplacingMergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree, SummingMergeTree, and AggregatingMergeTree address deduplication, pre‑aggregation, and custom aggregation needs, eliminating the need for prior data preprocessing.
6. Data Sampling Support
ClickHouse can perform percentage‑based data sampling, enabling fast statistical analysis on large datasets without scanning the entire data.
Recommended use cases include massive data storage and query, user behavior analysis, real‑time reporting, business intelligence, and any real‑time analytical workload.
JCHDB combines ClickHouse with Zookeeper on a Kubernetes (K8s) platform, using ReplicatedMergeTree for multi‑replica consistency, VPC isolation for security, and comprehensive monitoring of both system and ClickHouse‑specific metrics.
The architecture naturally fits K8s deployment with StatefulSets, Helm charts, and custom Operators for CRD‑based management, enabling rapid scaling, high availability across multiple zones, and automated failover.
JCHDB provides extensive metrics such as CPU, memory, disk I/O, QPS, insert rates, backlog jobs, and active connections, offering deep observability for users.
In summary, JCHDB leverages ClickHouse’s full suite of optimizations to deliver high‑performance, scalable OLAP capabilities on JD Cloud, and readers are invited to try the service via the provided link.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
