Design and Implementation of a Real‑Time OLAP Engine Using ClickHouse in JD Energy Management Platform
This article describes how JD's Energy Management Platform leverages ClickHouse as a high‑performance, MPP‑based OLAP engine to provide real‑time, multi‑dimensional analytics on IoT energy data, covering business background, technology selection, system architecture, data ingestion, storage, replication, and a generic query interface with code examples.
ClickHouse is an OLAP database designed for big‑data scenarios, offering high query performance, lightweight architecture, and easy maintenance.
The JD Energy Management Platform collects, monitors, and analyzes energy consumption data (electricity, water, gas) from IoT devices, providing multi‑dimensional real‑time analysis and alerts.
Technical selection compared MPP architectures (Presto, Impala, SparkSQL, Drill) and pre‑computation systems (Kylin, Druid); the platform chose ClickHouse for its MPP capabilities, flexibility, and cloud‑deployment suitability.
System architecture includes a device management layer, Kafka message bus, a differential processor, an anomaly rule chain, the ClickHouse OLAP engine, and a multi‑dimensional analysis service exposed via unified APIs.
Data ingestion uses ClickHouse’s Kafka engine table as an ETL pipeline: a Kafka table extracts messages, a materialized view transforms data, and a ReplicatedMergeTree table stores it.
<code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important"> 1 CREATE TABLE statistics_kafka ON CLUSTER '{cluster}' (
2 timestamp UInt64,
3 level String,
4 message String
5 ) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka.jd.com:9092',
6 kafka_topic_list = 'statistics',
7 kafka_group_name = 'gp-st',
8 kafka_format = 'JSONEachRow',
9 kafka_skip_broken_messages = 1,
10 kafka_num_consumers = 3;</code> <code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important">1 CREATE MATERIALIZED VIEW statistics_view ON CLUSTER '{cluster}' TO statistics_replica AS
2 SELECT timestamp,
3 level,
4 message
5 FROM statistics_kafka;</code> <code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important">1 CREATE TABLE statistics_replica ON CLUSTER '{cluster}' {
2 timestamp UInt64,
3 dt String,
4 deviceId String,
5 level String,
6 message String
7 } ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/statistics_replica','{replica}')
8 PARTITION BY dt
9 ORDER BY (dt,deviceId,level);
10 CREATE TABLE statistics ON CLUSTER '{cluster}' AS statistics_replica ENGINE = Distributed(ck_cluster_1,test,events_local,rand());</code> <code style="line-height: 22px; font-size: 14px; letter-spacing: 1px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(33, 33, 33); padding: 0.1em; display: -webkit-box !important">1 public List<Map<String,Object>> queryStatisticsResult(Query query);
2 public class Query {
3 private static final long serialVersionUID = 4904019884726531900L;
4 private List<String> dimensions;
5 private List<Measure> measures;
6 private List<Filter> where;
7 }
8 public class Measure implements Serializable {
9 private static final long serialVersionUID = -8556179136317748835L;
10 @NonNull private String name;
11 @NonNull private String field;
12 @NonNull private AggregationEnum expression;
13 }
14 public enum AggregationEnum {SUM,AVG,COUNT,MIN,MAX,COUNT_DISTINCT,PERCENTILE;}</code>Storage employs ReplicatedMergeTree for high availability and a Distributed table for query routing; replication and sharding are coordinated via Zookeeper.
An OLAP‑oriented generic query interface is designed, inspired by MDX, with Java classes Query, Measure, and AggregationEnum to express dimensions, measures, filters, and aggregation types.
The article concludes that the described architecture demonstrates a practical real‑time OLAP solution for energy data, offering a reference for similar big‑data analytics projects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
