Big Data 15 min read

JD's OLAP Architecture: Design, Challenges, and Solutions

This article explains how JD constructs its OLAP platform from data ingestion to storage, querying, and management, describing the diverse data sources, real‑time and offline processing, scalability, consistency, fault tolerance, and future optimization plans, while addressing key technical challenges and solutions.

DataFunTalk

May 5, 2021

JD's OLAP Architecture: Design, Challenges, and Solutions

Guest Speaker : Li Yang, Senior R&D Engineer at JD.

Overview : The talk introduces JD's end‑to‑end OLAP construction, starting from business demand scenarios, analyzing existing problems, proposing solutions, and outlining the evolution of the OLAP system.

Demand Scenarios

1. JD Data Ingress

① Business Data – Orders : JD's e‑commerce platform generates order data, which is analyzed from multiple dimensions such as store, product category, and conversion rates.

② Behavioral Data – Clicks and Searches : User click and search actions are combined with order information for funnel analysis and conversion rate calculation.

③ Advertising and Recommendations : Based on order behavior, targeted ads and recommendations are delivered and measured.

④ Monitoring Metrics : Operational metrics, alongside user behavior data, are also monitored.

2. JD Data Egress

Data export is divided into offline and real‑time streams.

Offline: weekly/monthly reports, financial statements, and machine‑learning training data.

Real‑time: interactive queries for analysts, real‑time dashboards for promotions, and dynamic resource adjustments.

Key Issues and Solutions

1. Write

Data source diversity : files, HDFS, Kafka/MQ, and various formats (CSV, TSV, JSON, AVRO, PARQUET, BINLOG). Solution: a unified import service that abstracts source types, allowing users to configure import via a visual UI (select topic, target, format, field types).

2. Timeliness

Real‑time data requires immediate computation; offline data can be batch‑processed. Solution: physically isolate real‑time and offline clusters to avoid interference and allocate resources appropriately.

3. Updates and Deletions

Updates are handled by overwriting records (e.g., order status changes); deletions use partition drops or versioned data replacement.

4. High Throughput

Solution: equip real‑time clusters with 10 GbE and SSDs, offline clusters with HDDs.

5. Storage

Challenges: petabyte‑scale data cannot be stored on a single node. Solution: distributed storage with columnar format, compression (e.g., Snappy), and multi‑replica fault tolerance.

6. Consistency

Solution: distributed coordination (e.g., Zookeeper) combined with local transaction mechanisms to ensure data consistency.

7. Read

Techniques: partitioning by time, pre‑aggregation, indexing (hash, B‑tree, range, inverted), materialized views.

8. Usability

Solution: support JDBC/ODBC and standard SQL, provide a graphical interface for analysts without database expertise.

9. QPS

Solution: partition cache, result cache, multi‑replica deployment, and scaling hardware.

10. Management

Current issues: manual disk replacement and data rebalancing are time‑consuming. Solutions: automated monitoring and alerting, black‑list node removal, scripted node replacement reducing downtime from hours to minutes.

Evolution of JD's OLAP

1.0 Era: Small order data, handled by relational databases (Oracle/MySQL).

2.0 Era: Added logistics, supply‑chain, customer service, payment; data grew to TB/PB, prompting offline warehouses using Hive and Spark.

3.0 Era: Real‑time queries introduced; unified OLAP service using Doris and ClickHouse to serve both batch and streaming workloads.

Future Plans

Management Platform Optimization:

Dynamic scaling of ClickHouse nodes.

Intelligent operations to automate node up/down and data balancing.

Real‑time cache enhancements in Doris.

Smart index management that auto‑creates indexes based on query patterns.

Q&A Highlights:

JD has not used Druid due to limited SQL support and unsuitability for rapidly changing order data.

ClickHouse excels in single‑table queries; Doris performs better on large joins and offers higher QPS in some scenarios.

Operational cost is lower for Doris because of automatic node scaling.

Data updates are simpler in Doris (overwrite) compared to ClickHouse's multiple engines.

Automatic engine selection is not yet implemented; users currently choose manually.

Data ingestion to ClickHouse follows two paths: legacy user‑managed pipelines or a unified OLAP service built on the platform.

End of presentation – thank you for listening.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Big Data Real-time Processing OLAP JD.com

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.