Databases 14 min read

How ClickHouse Achieves Billion‑Row Queries in Seconds: Architecture & Cloud Deployment

This article explains why ClickHouse, the high‑performance columnar OLAP database, can return results on billions of rows within seconds, detailing its columnar storage, MergeTree engine, and how JD Cloud deploys and optimizes it on Kubernetes for scalability and reliability.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
How ClickHouse Achieves Billion‑Row Queries in Seconds: Architecture & Cloud Deployment

With the rise of big data and IoT, enterprises need databases that can handle massive data volumes efficiently. ClickHouse, developed by Yandex, stands out for delivering sub‑second query responses on billions of rows, making it popular among major tech companies.

OLTP vs OLAP

Traditional databases like Oracle and MySQL are OLTP (On‑Line Transaction Processing) systems focused on low latency, data integrity, and 24/7 availability for transactional workloads. OLAP (On‑Line Analytical Processing) databases, such as ClickHouse, store massive, rarely‑updated datasets for multidimensional analytical queries, prioritizing high query throughput over transaction speed.

Key Feature: Columnar Storage

ClickHouse uses a column‑oriented storage engine. Unlike row‑based storage that reads entire rows even when only a few columns are needed, columnar storage groups each column into separate files, allowing queries to read only the relevant data and dramatically reducing I/O.

B+Tree vs MergeTree

Traditional InnoDB stores data in a B+Tree structure, where each node holds primary key values and data pages are accessed via leaf nodes. ClickHouse replaces B+Tree with the MergeTree engine, which organizes data into parts and granules, enabling efficient bulk inserts and fast reads.

MergeTree Storage Process

Data is partitioned (e.g., by birth date) and each column is written to its own file. ClickHouse creates a Data part for each batch, assigning a min and max block number. An asynchronous background task merges small parts within the same partition, while a .mrk file records offsets for fast lookup.

Why JD Cloud Uses Kubernetes for ClickHouse

Kubernetes abstracts underlying infrastructure differences, letting developers focus on database logic.

The platform can be deployed unchanged on public clouds, private clouds, or hybrid environments.

Deployment Process on JD Cloud

JD Cloud extends the open‑source ClickHouse Operator, adding custom APIs, pod‑affinity rules, and security labels to avoid placing primary and replica pods on the same physical node. Helm charts submit configuration forms to Kubernetes, which then creates ClickHouse instances, Zookeeper clusters, Prometheus monitoring, and Grafana dashboards.

Monitoring, Scaling, and High Availability

Pods run on up to 64‑core CPUs with 512 GB memory, providing high performance.

Kubernetes automatically replaces failed pods and re‑attaches persistent volumes via StatefulSets.

Hot‑scaling of CPU, memory, and storage is supported without downtime.

Integrated Prometheus and Grafana give real‑time visibility into pod resource usage and enable multi‑dimensional alerts via email, SMS, or WeChat.

ClickHouse’s powerful SQL‑compatible engine, columnar storage, and parallel processing enable enterprises to unlock greater value from massive data workloads, provided they adapt their data‑warehouse design and pipelines to fully exploit these capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesClickHouseOLAPcloud deploymentdata storageColumnar DatabaseMergeTree
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.