Databases 11 min read

How Cloud‑Native Persistent Index Boosts StarRocks Performance 10× in Elastic Scheduling

StarRocks 3.3.1 introduces a cloud‑native persistent index that moves index files to object storage, eliminates local‑disk constraints, and supports elastic scaling, delivering up to ten‑fold latency improvement over local‑disk indexes in elastic scheduling while matching performance in batch and real‑time imports.

StarRocks
StarRocks
StarRocks
How Cloud‑Native Persistent Index Boosts StarRocks Performance 10× in Elastic Scheduling

Background

StarRocks 3.3.1 introduces a cloud‑native persistent index that stores index files in object storage and accesses them through a unified local data cache. This design eliminates disk‑space contention, enables elastic scaling, and removes the need for large local disks on compute nodes.

Key Advantages

Index files reside in object storage; a local data cache provides fast access and unified cache management.

Elastic scaling: after tablet migration only a small amount of data needs to be fetched to rebuild the index, preserving real‑time import latency.

No requirement for compute nodes to mount local disks, simplifying deployment and reducing hardware cost.

Performance Evaluation

Three scenarios were tested: TPCH 100 GB batch import, small‑batch real‑time import of an order table, and an elastic scheduling test where a BE node is stopped, its local index files are cleared, and the tablet is rescheduled to a new node.

Test Environment

1 FE : ecs.g6.xlarge (4c 16g) - PL0
1 BE : ecs.g6.4xlarge (16c 64g) - PL1

Version

Version: branch-3.3-2b87854

TPCH 100 GB Batch Import

Table definitions for lineitem_pk and orders_pk were created with persistent_index_type='LOCAL|CLOUD_NATIVE'. The import results are shown below.

TPCH batch import results
TPCH batch import results

Small‑Batch Real‑Time Import

A table tbl_pk with a composite primary key was created. 7.6 GB of random order data split into 100 files was streamed via the PUT API. The import results are shown below.

Real‑time import results
Real‑time import results

Elastic Scheduling Test

Test steps:

Import data into tbl_pk.

Stop a BE node and delete its local persistent index files and cache to simulate tablet migration.

Start a new import transaction; the latency includes index rebuild time.

The cloud‑native index achieved roughly a ten‑fold latency reduction compared with the local‑disk index.

Elastic scheduling results
Elastic scheduling results

Enabling Cloud‑Native Persistent Index

From StarRocks 3.3.1, set persistent_index_type='CLOUD_NATIVE' in the table PROPERTIES clause. Example:

CREATE TABLE `orders` (
  `o_orderkey` int(11) NOT NULL,
  `o_orderdate` date NOT NULL,
  ...
) ENGINE=OLAP
PRIMARY KEY(`o_orderkey`, `o_orderdate`)
DISTRIBUTED BY HASH(`o_orderkey`) BUCKETS 96
PROPERTIES (
  "enable_persistent_index" = "true",
  "persistent_index_type" = "CLOUD_NATIVE"
);

After creation, run SHOW CREATE TABLE to verify that persistent_index_type is set to CLOUD_NATIVE. Altering the index type with ALTER TABLE is not currently supported.

Conclusion

For both large‑batch and small‑batch real‑time imports, cloud‑native and local‑disk persistent indexes deliver comparable performance.

In elastic scheduling scenarios, the cloud‑native index reduces latency by up to ten times thanks to its architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performancecloud nativedatabaseStarRockselastic schedulingpersistent index
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.