Databases 11 min read

StarRocks Practice Experience and SQL Optimization Cases in JD Daojia Data Platform

This article presents JD Daojia's data platform built on StarRocks, detailing business background, challenges such as pagination inconsistency, case‑when performance, and array‑type issues, and provides concrete SQL solutions, best‑practice tips, and future optimization directions.

Dada Group Technology

Dec 24, 2021

StarRocks Practice Experience and SQL Optimization Cases in JD Daojia Data Platform

JD Daojia, a leading local instant retail platform, built a full‑channel digital solution (HaiBo system) that integrates product, user, marketing, fulfillment and data modules, supporting over 100 billion‑level data analysis.

To meet real‑time and offline analytics, the data middle‑platform stores source data in MySQL, then processes and loads it into a unified warehouse using both MySQL and StarRocks, with stream load for real‑time and broker load for batch data; more than 80 % of the stored data resides in StarRocks.

Challenges and solutions

1. Pagination results differed when ordering by a low‑cardinality column (create_date). Because StarRocks distributes tasks across BE nodes, inconsistent ordering caused duplicate rows across pages. The fix was to add a high‑cardinality surrogate ID column to the ORDER BY clause.

CREATE TABLE `api_hw_prd_sku_brand_sku_di` (
  `create_date` date NOT NULL COMMENT "日",
  `upc` VARCHAR(255) NOT NULL COMMENT "upc",
  `stk_avl_cnt` BIGINT(20) NOT NULL COMMENT "",
  ...
) ENGINE = OLAP DUPLICATE KEY (`create_date`)
COMMENT "商品表"
PARTITION BY RANGE (`create_date`)(
  PARTITION p20200726 VALUES [('2020-07-26'), ('2020-07-27')]
)
DISTRIBUTED BY HASH (`brand_trademark_id`, `vender_id`) BUCKETS 10
PROPERTIES ("replication_num" = "3", ...);

Pagination query using UPC:

select stk_avl_cnt from api_hw_prd_sku_brand_sku_di
where create_date = 'xx' order by upc desc limit 'xx','xx';

Pagination query using create_date:

select stk_avl_cnt from api_hw_prd_sku_brand_sku_di
where create_date = 'xx' order by create_date desc limit 'xx','xx';

2. Multiple CASE WHEN expressions caused a 4.5 s query on a billion‑row table. By splitting the logic into separate parallel queries and aggregating the results, execution time dropped to 248 ms, an 18‑fold improvement.

3. Using ARRAY type for product attributes reduced one‑to‑many joins, but a bug in version 1.17 caused mismatched results when a numeric field was passed as a string; the issue was fixed in version 1.18.

Additional SQL best‑practice tips for StarRocks include: always apply an aggregate function to non‑grouped columns, and include a high‑cardinality ORDER BY column when using LIMIT for pagination.

Future work focuses on materialized views for hot queries, aggregation models to reduce storage, colocation joins to lower network latency, and expanding StarRocks as the unified analytics layer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Case Study Big Data StarRocks Data Warehouse sql-optimization Database Performance

Written by

Dada Group Technology

Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.