Databases 14 min read

StarRocks-Based Unified Data Service and Analytics Platform at JD Logistics

JD Logistics leverages StarRocks to create the Udata unified query engine, addressing data silos, low performance, and high maintenance costs by integrating data services and analytics, enabling low‑code data service generation, high‑speed federated queries, real‑time updates, and future data‑lake and resource isolation capabilities.

DataFunTalk
DataFunTalk
DataFunTalk
StarRocks-Based Unified Data Service and Analytics Platform at JD Logistics

JD Logistics, a leading technology‑driven supply‑chain provider, has built a comprehensive intelligent logistics system based on 5G, AI, big data, cloud computing, and IoT, operating 43 large "Asia No.1" smart warehouses and holding over 5,500 patents and software copyrights as of December 2021.

To overcome data islands, poor query performance, high maintenance difficulty, and low development efficiency in its existing platform, JD Logistics developed the Udata unified query engine using a StarRocks‑based federated query solution, dramatically reducing development and operation costs and eliminating the separation between data services and analytics.

The Udata platform abstracts metric generation into low‑code configurations, allowing non‑engineers to publish data services within 30 minutes, and provides a unified metric management system and data map for intuitive metric discovery and reuse.

StarRocks delivers exceptional query performance for both wide tables and multi‑table joins, supports real‑time data ingestion, and powers the unified query engine that combines analysis and service layers, enabling immediate transformation of analytical results into online data services.

Key technical innovations include:

Vectorized execution, boosting operator performance by 3‑10×.

Materialized view acceleration, improving aggregation queries on billion‑row tables by over 10×.

Cost‑Based Optimizer (CBO) for optimal plan selection.

Adaptive low‑cardinality optimization using global dictionaries.

For multi‑table joins, StarRocks supports various join strategies (broadcast, shuffle, bucket, colocated, replicated) and, combined with CBO, achieves 3‑5× faster performance compared to ClickHouse.

Federated query extensions allow aggregation push‑down to external engines such as MySQL, ElasticSearch, and ClickHouse, reducing data transfer and leveraging external aggregation capabilities.

Real‑time update strategies discussed include:

Merge‑on‑read (used by StarRocks aggregation models and ClickHouse ReplacingMergeTree).

Copy‑on‑write (used by Apache Hudi, Iceberg).

Delete‑and‑insert (upsert) with primary‑key indexing, offering superior query performance and concurrency.

Future directions involve building a data‑lake‑centric batch‑stream unified storage, expanding supported data sources (e.g., Redis, HBase), implementing cross‑cluster federated queries, and enhancing resource isolation via StarRocks resource groups.

The article concludes by thanking the StarRocks community and inviting further collaboration.

big datareal-time analyticsdatabaseStarRocksdata integrationUnified Query Engine
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.