Databases 17 min read

What’s New in StarRocks 3.5? Snapshot Backup, Bulk Load, Partition & Transaction Enhancements

StarRocks 3.5 introduces a cluster‑level Snapshot backup for fast recovery, a bulk‑load optimization that reduces small files and compaction cost, smarter partition management with time‑based merging and TTL, multi‑statement transactions with full ACID guarantees, low‑cardinality dictionary support for lake tables, and several security and performance upgrades.

StarRocks

Jun 26, 2025

Cluster‑Level Snapshot Backup

StarRocks 3.5 adds a cluster‑level Snapshot feature that automatically creates a point‑in‑time image of the entire cluster, including catalog, databases, tables, users and other metadata. The snapshot is stored in object storage (e.g., S3) and can be restored locally or to a different region within minutes.

A snapshot consists of two parts:

Metadata Snapshot : generated by the Frontend (FE) via checkpoint, it records schema, table definitions, users, permissions, etc.

Data Snapshot : references the data version already present in object storage; no data copy is required during backup.

When enabled, the system creates a new snapshot every 10 minutes by default, retains the latest snapshot and its dependent data version, and cleans up obsolete data automatically.

Bulk Load Optimization (Load Spill)

Large‑scale data imports previously produced many small files, hurting query performance and increasing compaction overhead. The new Load Spill mechanism spills memtables to object storage during import, preventing the creation of excessive small files.

Benefits include:

Avoiding a proliferation of small files, reducing metadata management cost.

Achieving an optimal queryable state immediately after import.

Performance tests show that enabling Load Spill reduces import‑time latency by 13‑26 % while improving overall import efficiency by 5‑45 % depending on resource pressure.

-- Example configuration
SET enable_load_spill = true;

Partition Management Enhancements

StarRocks 3.5 introduces two major partition features:

Time‑Expression Partition Merging : Users can merge adjacent historical partitions into larger ones (e.g., daily partitions → monthly) via ALTER TABLE … PARTITION BY <time_expr>. This reduces partition count, improves optimizer pruning, and lowers memory usage.

General Partition TTL : By setting the table property partition_retention_condition, users can define a retention policy (e.g., keep the last three months). The system automatically drops expired partitions without manual intervention.

ALTER TABLE sales
PARTITION BY date_trunc('month', dt)
WHERE dt BETWEEN '2024-01-01' AND '2024-03-31';

CREATE TABLE t1 (
  dt DATE,
  province STRING,
  num INT
) DUPLICATE KEY(dt)
PARTITION BY (dt, province)
PROPERTIES (
  "partition_retention_condition" = "dt >= CURRENT_DATE() - INTERVAL 3 MONTH",
  "replication_num" = "1"
);

Materialized View Partition Enhancements

Asynchronous materialized views now support:

Multi‑column partition expressions matching the base table.

Partition‑level TTL, automatically keeping only the latest valid partitions during refresh.

Transparent query rewrite that respects partition TTL, falling back to the base table when needed.

These improvements accelerate queries on recent data while reducing storage costs.

Multi‑Statement Transaction (ACID)

StarRocks 3.5 adds full multi‑statement transaction support with standard BEGIN, COMMIT and ROLLBACK syntax. The transaction guarantees:

Atomicity : All statements succeed or none are applied.

Consistency : Writes respect table constraints.

Isolation : Currently implements Read Committed level; uncommitted writes are invisible to other sessions.

Durability : In the integrated compute‑storage architecture, data is persisted on local disks with replication; in the compute‑separate architecture, data is persisted to object storage.

BEGIN;
INSERT INTO orders (order_id, customer_id, amount) VALUES (1, 101, 250.00);
INSERT INTO order_items (order_id, item_id, quantity)
SELECT * FROM orders_details WHERE product_id = 1009;
COMMIT;

Low‑Cardinality Dictionary for Lake Tables

Previously, low‑cardinality dictionary encoding was only available for internal tables. Version 3.5 extends this optimization to lake tables (e.g., Hive, Iceberg) stored as Parquet/LZ4 in object storage.

The workflow:

During query planning, the optimizer detects low‑cardinality columns and triggers LowCardinalityRewriteRule to sample data.

Sampling reads a few files, builds a candidate dictionary, and caches it.

If the dictionary covers the column values, it is persisted globally; otherwise, a GlobalDictNotMatch exception falls back to raw string processing.

Failed matches are logged, and asynchronous re‑sampling updates the dictionary.

Benchmark on a 100 GB SSB dataset shows a 2.62× speedup for queries involving low‑cardinality columns.

-- Sample dictionary creation (simplified)
SELECT build_global_dict(col) FROM lake_table LIMIT 5;

Security and Performance Updates

Additional enhancements in 3.5 include:

Support for OAuth 2.0, JWT, and OIDC authentication schemes.

SSL encryption for MySQL protocol connections.

Upgrade of the default runtime JDK from 11 to 17, improving memory management and overall stability.

Beta support for creating and modifying Iceberg views, including nested namespace handling.

Release notes: https://docs.mirrorship.cn/zh/releasenotes/release-3.5/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

StarRocks Data Lake bulk load ACID Transactions Partition Management Low Cardinality Dictionary Snapshot Backup

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.