Databases 20 min read

How JD Logistics Boosted Query Speed and Cut Costs with StarRocks Storage‑Compute Separation

JD Logistics transformed its one‑stop self‑service analytics platform, UData, by migrating from an integrated storage‑compute architecture to a storage‑compute separated design powered by StarRocks, achieving sub‑10‑second P95/P99 query latency, reducing storage costs by 90%, and cutting compute expenses around 30% while supporting massive data volumes.

StarRocks

Mar 27, 2025

In a presentation at the StarRocks Annual Summit, senior JD Logistics technologist Kang Qi described the evolution of the UData analytics platform from a tightly coupled storage‑compute model to a separated architecture, highlighting significant performance gains and cost reductions.

UData Platform Overview

UData is a one‑stop, self‑service data analysis platform that enables frontline staff (about 100,000 daily users) to acquire, process, and share data without manual extraction or Excel stitching. Users configure data sources via a visual interface or SQL, and results are automatically pushed to internal systems or email.

Underlying Architecture

The platform relies on StarRocks for its core capabilities: real‑time data ingestion, federated queries, and lake‑house integration. Data sources include real‑time streams (via Flink from JDQ/JMQ), relational databases (MySQL, Oracle), OLAP engines (ClickHouse, Elasticsearch), APIs, and offline warehouses (Hive, Hudi). Federated queries are handled by StarRocks catalogs, while data management stores both internal and external tables, providing a data map for metadata.

Why Move to Storage‑Compute Separation

Growing storage demand : Business requires real‑time writes for massive volumes and long‑term retention (up to 1‑2 years) for comparative analytics.

Storage cost pressure : SSD‑based storage in the integrated model became prohibitively expensive; OSS object storage dramatically lowers per‑TB cost.

Lack of elastic scaling : Integrated clusters cannot auto‑scale during peak events (e.g., large promotions), leading to over‑provisioning and higher OPEX.

Cloud‑native deployment : Separation enables independent scaling of compute and storage, fitting Kubernetes‑native operations.

Deployment Details

The separated cluster runs on Kubernetes in JD Cloud (JDOS). High‑performance nodes equipped with 10 GbE NICs and SSDs serve compute, while OSS provides the storage tier. StarRocks Operator automates FE and CN specifications via CRD files, supporting instance types such as 16C 64G and 32C 128G. Dual‑AZ deployment with StarRocks Proxy ensures high availability.

Storage‑Volume and OSS Bucket Mapping

Each logical storage volume maps to an OSS bucket. Large tables receive dedicated buckets to guarantee throughput, while smaller tables share buckets based on traffic forecasts. A metadata table tracks table‑to‑bucket relationships, compensating for the lack of direct visibility in Information Schema.

Real‑Time Ingestion and Stream Load

All real‑time data is ingested via Flink using StarRocks Stream Load. Low‑code pipelines allow configuration in 1–5 minutes. Optimizations include reverse‑lookup of StarRocks catalog schemas for Kafka streams, automatic SQL generation for ETL, and relaxed batch buffers to keep end‑to‑end latency within 3–5 minutes.

Performance and Cost Evaluation

Write throughput : A 5‑node separated cluster matches the write throughput of a comparable integrated cluster, thanks to Batch Publish Version optimization.

Query latency : With a 20‑day cache TTL, cache‑hit queries achieve P95/P99 latency under 10 seconds, comparable to integrated clusters; cache‑miss queries stay below 1 minute, far outperforming Hive.

Storage cost : OSS reduces per‑TB storage cost by ~90% versus local SSD.

Compute cost : Horizontal Pod Autoscaler (HPA) cuts compute expenses by ~30% under similar query loads.

Stability, Compaction, and Vacuum Tuning

Compaction threads and queues were increased, and Base/Cumulative compaction triggers were adjusted to minimize interference with bulk writes. Compaction scores are monitored with alerts; the latest StarRocks 3.3 adds configurable ingestion slowdown thresholds. Vacuum mechanisms clean obsolete OSS objects; thread‑pool sizes and parallelism were tuned, and Grace Periods extended during peak query periods to avoid premature metadata deletion.

Partition Query Hard Limits

To prevent full‑table scans, a selectedPartitionNumLimit (e.g., 31) is set in CloudNativeTable definitions. Queries exceeding this limit are rejected with clear errors, except for optimizer‑required scans such as statistics collection.

Statistics Collection Optimization

Full‑volume statistics are now scheduled during off‑peak night windows, while incremental collection runs during business hours, reducing impact on query latency for large‑partition tables.

Future Plans

Roadmap focuses on expanding data migration to the separated cluster, building a KV‑style catalog to ingest Redis/HBase data, and collaborating with the StarRocks community on features like column‑row hybrid storage and GIN indexes. Cost‑saving strategies include merging Stream Load tasks, proactive caching (e.g., Redis for hot dashboards), and finer‑grained autoscaling with Vertical Pod Autoscaler (VPA).

For more technical details, see the original StarRocks documentation and the JD Logistics case study linked in the source.

performance optimization Kubernetes StarRocks Data Platform cost reduction Storage-Compute Separation

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.