Building and Optimizing JD Retail OLAP Platform: Architecture, Management, and Performance Techniques
This article details JD Retail's OLAP platform construction, covering control plane design, architecture, business and operation management, real‑time data updates, materialized view usage, join optimizations, high‑concurrency and high‑throughput scenarios, and promotional preparation strategies, illustrated with diagrams and performance metrics.
Speaker Li Yang, a senior OLAP R&D engineer at JD, shares the construction and practice of JD Retail's OLAP platform, covering four major parts: control‑plane construction, optimization techniques, typical business scenarios, and promotion preparation.
1. Control‑Plane Introduction – The control plane provides high‑reliability, efficient deployment, and sustainable operation, especially important for ClickHouse, whose operational capabilities are weak but performance is high. The overall architecture routes requests through domain resolution and routing rules to adminServer, which validates requests, enqueues tasks, and workers consume the queue to write results to backend storage. This handles large‑scale cluster deployments and quota changes.
2. Business Management – The control plane offers functions such as cluster account applications, business‑level registration, quota queries (including query count, concurrency, and timeout limits), custom monitoring alerts, and slow‑query statistics, enabling users to manage resources and monitor service reliability.
3. Operation Management – Key capabilities include rapid new‑cluster deployment, node fault handling (e.g., CPU, memory, disk failures), quota control during peak promotions, and cluster health inspections that verify table creation, deletion, insertion, and query operations.
4. Optimization Techniques
Scenario Challenges – JD Retail faces complex transaction logic with many tables and real‑time updates, as well as massive traffic data that is append‑only, high‑volume, and frequently changing in quality.
Real‑time Data Updates – Three deduplication methods are presented: OPTIMIZE (partition‑level), FINAL (local‑table level), and argMax (distributed‑table level). Each method’s scope is explained with example tables and query results.
Materialized Views – Using SummingMergeTree for pre‑aggregation reduces query time from 2.1 s to 0.002 s (≈113× faster) on a 13‑billion‑row table. The view creates a hidden internal table that stores aggregated results, dramatically shrinking data size for queries.
Join Optimization – Two strategies are described: global join (creates a temporary merged table to reduce the number of partial queries from N² to 2 × N) and local join (converts the right side to a local table, reducing network traffic). The recommendation order is local join first, then global join, and finally placing the smaller table on the right.
5. Typical Business Scenarios
High‑Concurrency Queries – Example: an advertising real‑time tracking project reaches ~2000 QPS during a 618 promotion. Optimizations include increasing replicas, limiting max_threads, and adjusting query_thread_log storage to prevent disk saturation, raising stable QPS from ~1000 to ~2000.
High‑Throughput Writes – Example: JD Cloud monitoring writes up to 6000 billion rows per day (≈6 GB/s) using a 60‑shard, 2‑replica cluster. Techniques include chproxy load balancing at the SQL level and using local tables for 2–3× faster writes.
6. Promotion Preparation (Big‑Event Readiness)
Steps include resource collection and tier confirmation, monitoring and alert subscription, stress testing with realistic quotas, fault‑drill rehearsals (e.g., dual‑stream switch‑over), and downgrade measures for lower‑priority services. These ensure stable operation during traffic spikes.
7. Q&A Highlights
Key challenges are high concurrency (solved by replica expansion, thread limits, log management, and quota scaling) and join optimization (local vs dictionary joins). JD Retail primarily uses ClickHouse with Doris as a secondary engine for OLAP workloads.
Overall, the presentation demonstrates how JD Retail builds a reliable, high‑performance OLAP platform to support massive e‑commerce analytics, high‑throughput ingestion, and large‑scale promotional events.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
