Big Data 12 min read

How JD’s Energy Management Platform Leverages ClickHouse for Real‑Time OLAP at Scale

This article explains how JD’s Energy Management Platform uses ClickHouse as an MPP‑based OLAP engine to ingest, store, and provide multi‑dimensional real‑time analytics on energy consumption data, covering architecture decisions, data pipelines, replication, sharding, and a generic query interface.

JD Cloud Developers

Jan 28, 2021

How JD’s Energy Management Platform Leverages ClickHouse for Real‑Time OLAP at Scale

ClickHouse is an OLAP database designed for big‑data scenarios, offering extreme query performance, a lightweight architecture, and simple maintenance.

JD Energy Management Platform, an IoT‑driven product for government and enterprise customers, collects, monitors, analyzes, and alerts on energy consumption data (electricity, water, natural gas) across multiple dimensions such as time (year, month, week, day, hour), manufacturer, workshop, production line type, production line, and device.

For multi‑dimensional data analysis, the industry introduced the concept of Business Intelligence (BI). Compared with OLTP systems, BI‑oriented systems are called OLAP (Online Analytical Processing). Over time, OLAP evolved from single‑machine tools (e.g., Excel) to relational analytical databases (e.g., Microsoft SSAS) and now to real‑time OLAP engines for massive data.

1. MPP architecture. The service sends each query to all compute nodes, aggregates results, and returns the final answer. Implementations include Presto, Impala, SparkSQL, Drill. MPP supports flexible data models but requires high memory usage for performance. 2. Pre‑computation systems. By pre‑processing frequently queried metric‑dimension combinations and storing results with indexes, query speed is accelerated. Implementations include Kylin and Druid. This approach offers high performance but lower flexibility and higher maintenance cost.

Evaluating performance, flexibility, deployment difficulty, development cost, maintainability, and cloud suitability, JD chose ClickHouse (MPP‑based) as the OLAP engine for the platform.

Platform Architecture

Device Management Platform: Manages device models, status, and data collection.

Message Bus: Kafka message queue using JSON for data exchange between the device management platform and the energy platform.

Differencer: Calculates the delta between successive cumulative reports to produce incrementally additive metrics.

Exception Rule Chain: Provides a set of rules to detect abnormal data; abnormal records are logged and excluded from processing.

OLAP Engine: ClickHouse‑based engine for multi‑dimensional queries.

Multi‑Dimensional Analysis Service: Offers a unified API for querying arbitrary dimension‑metric combinations.

Government & Enterprise UI: Web interface for government and enterprise customers.

Data Ingestion

ClickHouse uses a Kafka engine table to implement a typical ETL pipeline: data extraction via a Kafka engine table, transformation via materialized views, and loading into a MergeTree table.

CREATE TABLE statistics_kafka ON CLUSTER '{cluster}' (
  timestamp UInt64,
  level String,
  message String
) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka.jd.com:9092',
  kafka_topic_list = 'statistics',
  kafka_group_name = 'gp-st',
  kafka_format = 'JSONEachRow',
  kafka_skip_broken_messages = 1,
  kafka_num_consumers = 3;

Materialized view example:

CREATE MATERIALIZED VIEW statistics_view ON CLUSTER '{cluster}' TO statistics_replica AS
SELECT timestamp, level, message FROM statistics_kafka;

MergeTree table example:

CREATE TABLE statistics_replica ON CLUSTER '{cluster}' {
  timestamp UInt64,
  dt String,
  deviceId String,
  level String,
  message String
} ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/statistics_replica','{replica}')
PARTITION BY dt
ORDER BY (dt, deviceId, level);

Distributed table example (logical view routing queries to local tables):

CREATE TABLE statistics ON CLUSTER '{cluster}' AS statistics_replica
ENGINE = Distributed(ck_cluster_1, test, events_local, rand());

Replication and Sharding

Replication provides multiple identical copies of data across nodes via the ReplicatedMergeTree engine and Zookeeper. Sharding splits table data into parts stored on different compute nodes.

Query Interface Design

ClickHouse offers a standard SQL engine accessible via JDBC. To simplify multi‑dimensional queries, a generic interface inspired by MDX was designed, encapsulating dimensions, measures, and filters.

List<Map<String, Object>> queryStatisticsResult(Query query);

public class Query {
    private List<String> dimensions;
    private List<Measure> measures;
    private List<Filter> where;
}

public class Measure implements Serializable {
    private String name; // metric name
    private String field; // column name
    private AggregationEnum expression; // aggregation type
}

enum AggregationEnum { SUM, AVG, COUNT, MIN, MAX, COUNT_DISTINCT, PERCENTILE }

A typical analytical SQL statement:

SELECT day_str, factory_name, workshop_name, prodline_name, device_id,
       SUM(w_total) AS total
FROM statistics
WHERE day_str BETWEEN '2020-10-01' AND '2020-12-31'
GROUP BY day_str, factory_name, workshop_name, prodline_name, device_id
ORDER BY day_str ASC;

This query translates to “query the total electricity consumption of all devices in the factory for Q4 2020”. The dimensions appear in SELECT, WHERE, GROUP BY, and ORDER BY clauses, while metrics appear in the SELECT clause.

The article demonstrates how JD’s Energy Management Platform applies ClickHouse for real‑time OLAP, covering data ingestion, storage, replication, sharding, and a generic multi‑dimensional query interface, providing a practical reference for building large‑scale analytical systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data SQL real-time analytics ClickHouse OLAP data ingestion

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.