Exploring JD Logistics’ Billion‑Scale Data Management and Analytics with Apache Doris
This article details JD Logistics’ challenges in handling petabyte‑level data, outlines their existing data architecture, and explains how they adopted Apache Doris for faster, scalable analytics, covering table management, data import workflows, visualization tools, and future roadmap for data engineering.
JD Logistics, a leading e‑commerce logistics provider, faces massive data volumes generated by its integrated supply‑chain services, requiring real‑time, multi‑dimensional analysis to improve operational efficiency and customer experience.
The current data landscape includes production systems, a data warehouse, data marts, and various application systems, but suffers from issues such as high query latency, fragmented data standards, redundant reporting, and insufficient data governance.
To address these pain points, the data team introduced a new analytics stack based on Apache Doris, replacing the previous Presto‑BDP combination. Doris provides column‑store, in‑memory computation, and isolated resource pools, delivering sub‑second query responses on billion‑row datasets.
Key Doris operations are demonstrated with the following SQL snippets:
ALTER TABLE table_name ADD PARTITION IF NOT EXISTS p20200803 VALUES [('2020-08-03'), ('2020-08-04')]; TRUNCATE TABLE table_name PARTITION(p20200803,p20200804); show load from jddl_test where label = 'app_ea_pal_vender_all_sum_m_20201101_183213_19688970430' \GTable management in Doris includes creating tables, adding/removing partitions, and careful handling of roll‑up tables to avoid performance degradation.
Data ingestion is performed via Hive‑to‑Doris broker load, with parameters such as max_filter_ratio (error tolerance) and timeout (job timeout) to control quality and execution time.
Automation scripts enable scheduled data pushes, allowing users to specify target tables, columns, time windows, and database actions (create, reset, drop) through command‑line flags like -t, -c, -n, -e, and -d.
Beyond ingestion, the team built self‑service reporting capabilities using the JD Power BI platform, enabling business users to create dashboards within ten minutes, with cross‑device support for PC, iPhone, iPad, and Android.
The roadmap includes further offline data engine upgrades, tighter integration of business and technical roadmaps, and scaling the data team through methodology, talent development, and robust data‑permission management.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
