Big Data 25 min read

Optimizing StarRocks Tables: Design Tips, Real‑World Cases and Monitoring Strategies

This article explains how to design efficient StarRocks tables with proper field types, partitioning and bucketing, compares update and primary‑key models, presents real‑world cases of memory and tablet issues, provides a complete table‑creation example, and outlines comprehensive monitoring metrics to keep the analytical data warehouse performant and stable.

Weimob Technology Center

Oct 13, 2023

Optimizing StarRocks Tables: Design Tips, Real‑World Cases and Monitoring Strategies

StarRocks is a high‑performance analytical data warehouse that uses vectorized MPP, CBO, materialized views and real‑time columnar storage, compatible with MySQL and BI tools.

StarRocks Table Overview

Table creation is crucial; proper design can boost query performance by several times. StarRocks tables use columnar storage with partition and bucket mechanisms, where a table is divided into partitions and each partition into tablets replicated across BE nodes.

StarRocks table partition and tablet illustration

Table Design

Field Types

Define appropriate field types: use NOT NULL when possible and prefer numeric columns over strings.

Partition and Bucket

StarRocks uses a Range‑Hash combination. Partition key should be a date or integer column; keep partition size <100 GB. Bucket key should be high‑cardinality columns; typical bucket count 4‑8, each tablet size 100 MB‑1 GB.

Primary‑Key vs Update Model

The update model does not maintain a primary‑key index, while the primary‑key model supports real‑time updates, predicate push‑down and indexes, but consumes more memory.

Primary‑key tables load the entire primary‑key index into memory per partition; avoid excessive primary‑key columns and keep primary‑key length <127 bytes.

Case Studies

Case 1: Update memory overflow causing primary‑key import failure

Background: import tasks failed with OutOfMemory errors. Investigation showed the update memory limit (27 GB) was exceeded because tables without partitions loaded the whole table into memory.

Caused by: java.io.IOException: com.starrocks.connector.flink.manager.StarRocksStreamLoadFailedException: Failed to flush data to StarRocks, Error response:
{"Status":"Fail","Message":"close index channel failed, load_id=bc4eb485-f99b-1f20-ffd9-e7e7e22925ab"}

Solution: switch the table to the update model and add appropriate partitions.

Case 2: Primary‑key length exceeds limit

Background: import failed because the primary‑key length exceeded the 127‑byte limit. Removing one primary‑key column resolved the issue.

CREATE TABLE xxxx (
  pid BIGINT NOT NULL COMMENT '企业ID',
  task_id VARCHAR(64) NOT NULL COMMENT '任务ID',
  userwid VARCHAR(64) NOT NULL COMMENT '用户ID',
  username VARCHAR(150) NOT NULL COMMENT '用户姓名',
  sales_talk_id VARCHAR(64) NOT NULL COMMENT '线路ID',
  gap VARCHAR(50) NOT NULL COMMENT '通话间隔',
  gap_time VARCHAR(50) NOT NULL COMMENT '通时间隔',
  count BIGINT NOT NULL COMMENT '数量',
  create_time DATETIME NOT NULL COMMENT '创建时间',
  update_time DATETIME NOT NULL COMMENT '更新时间'
) PRIMARY KEY (pid, task_id, userwid, username, sales_talk_id, gap, gap_time) DISTRIBUTED BY HASH(pid) BUCKETS 8;

Case 3: Primary‑key memory optimization

The cluster had about 50 primary‑key tables consuming ~48 GB of BE memory. Converting unused primary‑key tables to the update model, adding partitions for hot data, and deleting stale data reduced memory usage and improved write/query performance.

Case 4: Tablet number reduction

The cluster originally had 1.4 M tablets (average 2.6 MB each). By changing daily partitions to monthly and reducing bucket count, tablets were reduced to 130 k, decreasing metadata size and FE memory consumption.

Table Creation Example

CREATE TABLE table1 (
  dd DATE NOT NULL COMMENT 'date',
  bos_id BIGINT NOT NULL COMMENT 'organization id',
  vid BIGINT NOT NULL COMMENT 'node id',
  path VARCHAR(65533) NULL COMMENT 'parent path',
  prod_id BIGINT NOT NULL COMMENT 'product id',
  prod_inst_id BIGINT NOT NULL COMMENT 'product instance id',
  st TINYINT NOT NULL COMMENT 'stat type',
  consign_type TINYINT NOT NULL COMMENT 'delivery status',
  order_no BIGINT NOT NULL COMMENT 'order number',
  merchant_id BIGINT NOT NULL COMMENT 'merchant id',
  vid_type BIGINT NOT NULL COMMENT 'node type',
  consign_vid BIGINT NULL COMMENT 'handler vid',
  create_time DATETIME NOT NULL COMMENT 'creation time'
) ENGINE=OLAP
UNIQUE KEY (dd, bos_id, vid, path, prod_id, prod_inst_id, st, consign_type, order_no)
PARTITION BY RANGE(dd) (
  PARTITION p2022 VALUES [('1970-01-01'), ('2023-01-01')),
  PARTITION p2023 VALUES [('2023-01-01'), ('2024-02-01')),
  PARTITION p202401 VALUES [('2023-02-01'), ('2024-03-01'))
)
DISTRIBUTED BY HASH(bos_id, order_no) BUCKETS 6
PROPERTIES (
  "replication_num" = "3",
  "dynamic_partition.enable" = "true",
  "dynamic_partition.time_unit" = "MONTH",
  "dynamic_partition.time_zone" = "Asia/Shanghai",
  "dynamic_partition.start" = "-2147483648",
  "dynamic_partition.end" = "1",
  "dynamic_partition.prefix" = "p",
  "dynamic_partition.buckets" = "6",
  "dynamic_partition.start_day_of_month" = "1",
  "in_memory" = "false",
  "storage_format" = "DEFAULT"
);

Monitoring Overview

StarRocks monitoring uses Prometheus + Grafana with AlertManager. Metrics cover cluster status, FE JVM, BE CPU/memory, tablet distribution, compaction, query latency, load jobs, and transaction statistics.

Key metrics include cluster FE/BE status, JVM heap, CPU idle, memory usage, tablet distribution, scheduling tablets, query RPS/QPS, 99th latency, slow queries, load job counts, compaction rates, scan bytes/rows, and transaction rates.

Conclusion

Effective table design, proper partitioning and bucketing, careful choice between update and primary‑key models, and vigilant monitoring are essential for maintaining StarRocks performance and stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Analytics StarRocks partitioning Primary Key Table Design

Written by

Weimob Technology Center

Official platform of the Weimob Technology Center

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

StarRocks Table Overview

Table Design

Field Types

Partition and Bucket

Primary‑Key vs Update Model

Case Studies

Case 1: Update memory overflow causing primary‑key import failure

Case 2: Primary‑key length exceeds limit

Case 3: Primary‑key memory optimization

Case 4: Tablet number reduction

Table Creation Example

Monitoring Overview

Conclusion

Weimob Technology Center

How this landed with the community

Was this worth your time?

0 Comments

Case 1: Update memory overflow causing primary‑key import failure

Case 2: Primary‑key length exceeds limit

Case 3: Primary‑key memory optimization

Case 4: Tablet number reduction