Big Data 20 min read

Building and Optimizing JD Retail OLAP Platform: Architecture, Management, and Performance Techniques

This article details JD Retail's OLAP platform construction, covering control plane design, architecture, business and operation management, real‑time data updates, materialized view usage, join optimizations, high‑concurrency and high‑throughput scenarios, and promotional preparation strategies, illustrated with diagrams and performance metrics.

DataFunSummit
DataFunSummit
DataFunSummit
Building and Optimizing JD Retail OLAP Platform: Architecture, Management, and Performance Techniques

Speaker Li Yang, a senior OLAP R&D engineer at JD, shares the construction and practice of JD Retail's OLAP platform, covering four major parts: control‑plane construction, optimization techniques, typical business scenarios, and promotion preparation.

1. Control‑Plane Introduction – The control plane provides high‑reliability, efficient deployment, and sustainable operation, especially important for ClickHouse, whose operational capabilities are weak but performance is high. The overall architecture routes requests through domain resolution and routing rules to adminServer, which validates requests, enqueues tasks, and workers consume the queue to write results to backend storage. This handles large‑scale cluster deployments and quota changes.

2. Business Management – The control plane offers functions such as cluster account applications, business‑level registration, quota queries (including query count, concurrency, and timeout limits), custom monitoring alerts, and slow‑query statistics, enabling users to manage resources and monitor service reliability.

3. Operation Management – Key capabilities include rapid new‑cluster deployment, node fault handling (e.g., CPU, memory, disk failures), quota control during peak promotions, and cluster health inspections that verify table creation, deletion, insertion, and query operations.

Control Plane Diagram
Control Plane Diagram

4. Optimization Techniques

Scenario Challenges – JD Retail faces complex transaction logic with many tables and real‑time updates, as well as massive traffic data that is append‑only, high‑volume, and frequently changing in quality.

Real‑time Data Updates – Three deduplication methods are presented: OPTIMIZE (partition‑level), FINAL (local‑table level), and argMax (distributed‑table level). Each method’s scope is explained with example tables and query results.

Real‑time Update Diagram
Real‑time Update Diagram

Materialized Views – Using SummingMergeTree for pre‑aggregation reduces query time from 2.1 s to 0.002 s (≈113× faster) on a 13‑billion‑row table. The view creates a hidden internal table that stores aggregated results, dramatically shrinking data size for queries.

Materialized View Diagram
Materialized View Diagram

Join Optimization – Two strategies are described: global join (creates a temporary merged table to reduce the number of partial queries from N² to 2 × N) and local join (converts the right side to a local table, reducing network traffic). The recommendation order is local join first, then global join, and finally placing the smaller table on the right.

Join Optimization Diagram
Join Optimization Diagram

5. Typical Business Scenarios

High‑Concurrency Queries – Example: an advertising real‑time tracking project reaches ~2000 QPS during a 618 promotion. Optimizations include increasing replicas, limiting max_threads, and adjusting query_thread_log storage to prevent disk saturation, raising stable QPS from ~1000 to ~2000.

Concurrency Diagram
Concurrency Diagram

High‑Throughput Writes – Example: JD Cloud monitoring writes up to 6000 billion rows per day (≈6 GB/s) using a 60‑shard, 2‑replica cluster. Techniques include chproxy load balancing at the SQL level and using local tables for 2–3× faster writes.

Write Throughput Diagram
Write Throughput Diagram

6. Promotion Preparation (Big‑Event Readiness)

Steps include resource collection and tier confirmation, monitoring and alert subscription, stress testing with realistic quotas, fault‑drill rehearsals (e.g., dual‑stream switch‑over), and downgrade measures for lower‑priority services. These ensure stable operation during traffic spikes.

Promotion Preparation Diagram
Promotion Preparation Diagram

7. Q&A Highlights

Key challenges are high concurrency (solved by replica expansion, thread limits, log management, and quota scaling) and join optimization (local vs dictionary joins). JD Retail primarily uses ClickHouse with Doris as a secondary engine for OLAP workloads.

Overall, the presentation demonstrates how JD Retail builds a reliable, high‑performance OLAP platform to support massive e‑commerce analytics, high‑throughput ingestion, and large‑scale promotional events.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsPerformance OptimizationBig DataClickHouseOLAP
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.