How JD Logistics Tackled Billion-Scale Data Challenges with Doris
This article details JD Logistics' journey from fragmented, massive‑scale data to a unified, real‑time analytics platform, covering business needs, pain points, tool evaluation, a new Doris‑based architecture, table management, data import procedures, automation scripts, and future roadmap for data engineering.
1. Business Scenario
JD Logistics operates a nationwide, integrated supply‑chain service that requires real‑time, multi‑dimensional data analysis across hundreds of warehouses and delivery points. The data team faces massive data volumes, inconsistent standards, duplicated efforts, slow response times, and a lack of unified data asset management.
Early: massive multi‑dimensional queries demand real‑time performance.
Scattered: data stored in disparate systems without standardization.
Heavy: repetitive reporting (daily, weekly, monthly) is inefficient.
Slow: diverse regional and product‑line data scenarios hinder rapid change.
Lacking: no unified data asset management for self‑service analytics.
Difficult: leadership struggles to obtain data, measure marketing ROI, and drive data‑centric decisions.
2. Current Needs
The ecosystem consists of:
Production system : supports daily business operations and generates raw production data.
Data warehouse : a strategic repository for analytical reporting and decision support.
Data mart : built on the warehouse and big‑data platform, serving various business groups (CFO, CMO, COO, Mobile, etc.).
Application system : products that leverage data to assist users in making better decisions.
3. Data Team Approach: Business‑Finance Data System
The team aims to bridge the natural gap between operational and financial data, standardizing metrics so that costs and revenues can be traced to each transaction, enabling fine‑grained, real‑time business‑finance analysis.
4. Problems Faced
4.1 Data Visualization
Exporting data to local machines occurs ~3,000 times per week, with no traceability after export. Short‑term solutions add a warning dialog and generate export bills; long‑term solutions focus on user‑driven methodology, offline reporting, and self‑service exploration.
4.2 Permission Management
Analysis permissions are overly broad (e.g., analysts can access all tables), and metric permissions are scattered across systems, leading to chaos. Solutions include tightening BDP access based on business characteristics and centralizing metric control via a unified data API.
5. Tool Evaluation
An evaluation team compared internal JD Power (rapid iteration) with external BI tools (Tableau, Yonghong BI, etc.) across cost, maturity, usability, extensibility, and performance. Scores from business, product, and R&D stakeholders led to the final BI tool selection.
6. Solution Architecture
Existing stack: JD Power + Presto + BDP suffered from resource contention and slow query performance. The new architecture replaces Presto with Doris, providing isolated resources, decoupling BDP from reporting, and achieving second‑level query responses. Reported benefits include:
Query latency reduced from >10 seconds to sub‑second.
Independent resource control and on‑demand optimization.
7. Doris Table Management
Common operations:
Create table
Add partition
Drop partition
Key notes: standardize partition rules, limit excessive Rollup creation, and batch import data in small, serial batches to optimize resource usage.
8. Data Import from Hive to Doris (Broker Load)
Steps include converting Hive tables to Doris format, performing a Broker Load, and tracking load status.
Example load‑status query:
show load from jddl_test where label = 'app_ea_pal_vender_all_sum_m_20201101_183213_19688970430' \GImportant parameters:
LABEL : identifies the import batch for later status queries.
max_filter_ratio : maximum allowed error rate (e.g., 0.2 for 20%).
timeout : load job timeout in seconds (default 86 400 s).
9. Automated Data Push
Command‑line options for the automation script:
-t table name (required).
-c column list (optional, defaults to all columns).
-n number of days of data to push (default 1 day).
-e end date for data extraction (default yesterday).
-d Doris operation: db_reset (rebuild table), db_drop (delete table), db_create (create or show table DDL).
Note: Different database characteristics create integration bottlenecks; new technologies must be introduced with thorough pre‑planning.
10. Automated Reporting
By connecting JD Power to Doris, business users can configure data sources and build analysis reports within ten minutes, achieving a one‑stop platform for data preparation, report generation, and interactive analysis across PC, iPhone, iPad, and Android.
11. Future Plans
11.1 Offline Data Technology Upgrade
BDP will continue to evolve with optimized underlying models, lifecycle management for data tables, and smarter scheduling (Hive/Spark) to balance resource utilization.
11.2 Business‑Driven Technical Iteration
As business matures, finer‑grained operations demand a unified, systematic, clear, and flexible data layer that supports ad‑hoc queries and multi‑dimensional OLAP analysis.
11.3 Team Building
Focus areas include methodology development, technical skill enhancement for end‑to‑end offline‑real‑time pipelines, project management, fine‑grained data permissions, and talent pipeline construction.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
