Databases 21 min read

How ClickHouse Powers High‑Performance Time‑Series Data Management at JD’s JUST Engine

This article explains how JD’s JUST platform leverages the open‑source columnar database ClickHouse to store, query and analyze massive time‑series datasets, covering data modeling, lifecycle management, cluster architecture, write and query processes, scaling strategies and future enhancements.

JD Cloud Developers

Jan 5, 2021

How ClickHouse Powers High‑Performance Time‑Series Data Management at JD’s JUST Engine

Introduction

ClickHouse is an open‑source columnar OLAP database developed by Yandex, and JD’s JUST (Urban Computing) platform uses it to store and analyze massive time‑series data.

Time‑Series Data Model

Time‑series data consists of Metric, Timestamp, Tags and Field/Value. A typical multi‑value model is shown in Table 1.

Time‑Series Data Management Overview

The lifecycle includes data collection, storage, query/analysis and deletion. Requirements include high‑throughput writes, no updates, petabyte‑scale storage, real‑time queries, high availability, scalability, ease of use and maintenance.

Technology Selection

OpenTSDB, InfluxDB, TDengine and ClickHouse are compared; ClickHouse is chosen for its columnar storage, parallel processing, SQL interface and strong performance.

ClickHouse Fundamentals

ClickHouse stores data in column files, uses the MergeTree family of engines, supports multi‑core parallelism, provides HTTP/TCP clients, does not support transactions, and discourages row‑level updates/deletes.

Cluster Architecture

A ClickHouse cluster consists of instances, shards, replicas and a multi‑master mode. Data is replicated via ZooKeeper, which only stores metadata.

Distributed Engine

Distributed tables map to local tables across shards. Example DDL is shown.

Write Process

Writes are split into distributed write and replica synchronization, with logs written to ZooKeeper and replicas pulling tasks.

Query Process

Queries are routed to a replica; for multi‑shard queries the system may contact several replicas.

Important Index Engines

MergeTree, ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree and ReplicatedXXXMergeTree are described.

Deployment and High Availability

JUST uses horizontal sharding and at least two replicas per shard. Minimal deployment uses two nodes with either cross‑replica or primary‑backup configurations, and Docker/Kubernetes operators are available.

Dynamic Scaling

Scaling can add replicas (by updating config) or shards (by adjusting weights). Weight calculations are illustrated.

System Limitations and Future Work

Current JUST features include time‑range queries, tag filtering, down‑sampling and simple analysis; future plans cover real‑time ingestion, advanced aggregation, richer analytics, fault tolerance and full SQL support.

References

Links to Wikipedia time‑series, DB‑Engines ranking, InfluxDB clustering, TDengine testing, LZ4 compression and ClickHouse documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Scalability database clickhouse Data Management Time Series

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.