How ClickHouse Powers High‑Performance Time‑Series Data Management at JD’s JUST Engine
This article explains how JD’s JUST platform leverages the open‑source columnar database ClickHouse to store, query and analyze massive time‑series datasets, covering data modeling, lifecycle management, cluster architecture, write and query processes, scaling strategies and future enhancements.
Introduction
ClickHouse is an open‑source columnar OLAP database developed by Yandex, and JD’s JUST (Urban Computing) platform uses it to store and analyze massive time‑series data.
Time‑Series Data Model
Time‑series data consists of Metric, Timestamp, Tags and Field/Value. A typical multi‑value model is shown in Table 1.
Time‑Series Data Management Overview
The lifecycle includes data collection, storage, query/analysis and deletion. Requirements include high‑throughput writes, no updates, petabyte‑scale storage, real‑time queries, high availability, scalability, ease of use and maintenance.
Technology Selection
OpenTSDB, InfluxDB, TDengine and ClickHouse are compared; ClickHouse is chosen for its columnar storage, parallel processing, SQL interface and strong performance.
ClickHouse Fundamentals
ClickHouse stores data in column files, uses the MergeTree family of engines, supports multi‑core parallelism, provides HTTP/TCP clients, does not support transactions, and discourages row‑level updates/deletes.
Cluster Architecture
A ClickHouse cluster consists of instances, shards, replicas and a multi‑master mode. Data is replicated via ZooKeeper, which only stores metadata.
Distributed Engine
Distributed tables map to local tables across shards. Example DDL is shown.
Write Process
Writes are split into distributed write and replica synchronization, with logs written to ZooKeeper and replicas pulling tasks.
Query Process
Queries are routed to a replica; for multi‑shard queries the system may contact several replicas.
Important Index Engines
MergeTree, ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree and ReplicatedXXXMergeTree are described.
Deployment and High Availability
JUST uses horizontal sharding and at least two replicas per shard. Minimal deployment uses two nodes with either cross‑replica or primary‑backup configurations, and Docker/Kubernetes operators are available.
Dynamic Scaling
Scaling can add replicas (by updating config) or shards (by adjusting weights). Weight calculations are illustrated.
System Limitations and Future Work
Current JUST features include time‑range queries, tag filtering, down‑sampling and simple analysis; future plans cover real‑time ingestion, advanced aggregation, richer analytics, fault tolerance and full SQL support.
References
Links to Wikipedia time‑series, DB‑Engines ranking, InfluxDB clustering, TDengine testing, LZ4 compression and ClickHouse documentation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
