Big Data 11 min read

Design and Implementation of a Real‑Time OLAP Engine Using ClickHouse in JD Energy Management Platform

This article describes how JD's Energy Management Platform leverages ClickHouse as a high‑performance, MPP‑based OLAP engine to provide real‑time, multi‑dimensional analytics on IoT energy data, covering business background, technology selection, system architecture, data ingestion, storage, replication, and a generic query interface with code examples.

JD Tech Talk

Feb 5, 2021

Design and Implementation of a Real‑Time OLAP Engine Using ClickHouse in JD Energy Management Platform

ClickHouse is an OLAP database designed for big‑data scenarios, offering high query performance, lightweight architecture, and easy maintenance.

The JD Energy Management Platform collects, monitors, and analyzes energy consumption data (electricity, water, gas) from IoT devices, providing multi‑dimensional real‑time analysis and alerts.

Technical selection compared MPP architectures (Presto, Impala, SparkSQL, Drill) and pre‑computation systems (Kylin, Druid); the platform chose ClickHouse for its MPP capabilities, flexibility, and cloud‑deployment suitability.

System architecture includes a device management layer, Kafka message bus, a differential processor, an anomaly rule chain, the ClickHouse OLAP engine, and a multi‑dimensional analysis service exposed via unified APIs.

Data ingestion uses ClickHouse’s Kafka engine table as an ETL pipeline: a Kafka table extracts messages, a materialized view transforms data, and a ReplicatedMergeTree table stores it.

<code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important"> 1 CREATE TABLE statistics_kafka ON CLUSTER '{cluster}' (
 2   timestamp UInt64,
 3   level String,
 4   message String
 5 ) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka.jd.com:9092',
 6   kafka_topic_list = 'statistics',
 7   kafka_group_name = 'gp-st',
 8   kafka_format = 'JSONEachRow',
 9   kafka_skip_broken_messages = 1,
10   kafka_num_consumers = 3;</code>

<code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important">1 CREATE MATERIALIZED VIEW statistics_view ON CLUSTER '{cluster}' TO statistics_replica AS
2 SELECT timestamp,
3        level,
4        message
5 FROM statistics_kafka;</code>

<code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important">1 CREATE TABLE statistics_replica ON CLUSTER '{cluster}' {
2   timestamp UInt64,
3   dt String,
4   deviceId String,
5   level String,
6   message String
7 } ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/statistics_replica','{replica}')
8 PARTITION BY dt
9 ORDER BY (dt,deviceId,level);

10 CREATE TABLE statistics ON CLUSTER '{cluster}' AS statistics_replica ENGINE = Distributed(ck_cluster_1,test,events_local,rand());</code>

<code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important">1 public List<Map<String,Object>> queryStatisticsResult(Query query);
2 public class Query {
3   private static final long serialVersionUID = 4904019884726531900L;
4   private List<String> dimensions;
5   private List<Measure> measures;
6   private List<Filter> where;
7 }
8 public class Measure implements Serializable {
9   private static final long serialVersionUID = -8556179136317748835L;
10   @NonNull private String name;
11   @NonNull private String field;
12   @NonNull private AggregationEnum expression;
13 }
14 public enum AggregationEnum {SUM,AVG,COUNT,MIN,MAX,COUNT_DISTINCT,PERCENTILE;}</code>

Storage employs ReplicatedMergeTree for high availability and a Distributed table for query routing; replication and sharding are coordinated via Zookeeper.

An OLAP‑oriented generic query interface is designed, inspired by MDX, with Java classes Query, Measure, and AggregationEnum to express dimensions, measures, filters, and aggregation types.

The article concludes that the described architecture demonstrates a practical real‑time OLAP solution for energy data, offering a reference for similar big‑data analytics projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Kafka data modeling ClickHouse OLAP

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.