Databases 17 min read

How StarRocks 3.5 Enables Fast Cluster Snapshots and Disaster Recovery in Kubernetes

StarRocks 3.5 introduces a cluster‑level snapshot mechanism that automates backup to object storage, supports minute‑level recovery, and integrates with Kubernetes via Helm charts to streamline disaster‑recovery workflows for high‑availability workloads.

StarRocks
StarRocks
StarRocks
How StarRocks 3.5 Enables Fast Cluster Snapshots and Disaster Recovery in Kubernetes

StarRocks, a cloud‑native analytical database, enhances its disaster‑recovery capabilities in version 3.5 by adding a Cluster Snapshot mechanism. This feature creates automated, low‑cost snapshots of the entire cluster state—including metadata and data versions—and stores them in object storage, enabling rapid recovery within minutes.

Snapshot Architecture

Each snapshot consists of two parts:

Metadata Snapshot : Generated by the Frontend (FE) through periodic checkpoints, this image file captures catalog, database, table, user, and permission information.

Data Snapshot : Since data resides in object storage, the snapshot only records references to the relevant data versions, avoiding data duplication.

Automated snapshots are created by default every 10 minutes (configurable) and retain only the latest snapshot to minimize storage overhead.

Recovery Process in Kubernetes

When a failure occurs, the StarRocks Operator uses the disasterRecovery and disasterRecoveryStatus fields in the StarRocksCluster CRD to manage recovery phases (TODO → doing → done). The operator monitors the FE pod status, updates the phase accordingly, and finally restores the cluster state from the specified snapshot path in object storage.

spec:
  disasterRecovery:
    generation: 1
    enabled: true

Key steps include:

Enable disaster recovery by setting disasterRecovery.enabled to true and specifying a generation number.

Mount a cluster_snapshot.yaml ConfigMap that defines the snapshot location and storage volume.

Deploy the cluster with Helm, providing both the base values file and the snapshot ConfigMap.

The operator creates a new FE pod, applies the snapshot, and restores metadata and data versions.

After successful restoration, the operator launches remaining FE and CN pods, returning the cluster to a running state.

Practical Commands

Typical Helm installation:

helm install -f ./starrocks-values.yaml -f cluster_snapshot.yaml starrocks starrocks-community/kube-starrocks --version 1.10.0

Enable automated snapshots via SQL:

ADMIN SET AUTOMATED CLUSTER SNAPSHOT ON STORAGE VOLUME builtin_storage_volume;

Query snapshot jobs and status:

SELECT * FROM INFORMATION_SCHEMA.CLUSTER_SNAPSHOT_JOBS;
SELECT * FROM INFORMATION_SCHEMA.CLUSTER_SNAPSHOTS;

Verify data after recovery by connecting to the FE pod and querying the restored tables.

Considerations and Caveats

The snapshot feature currently supports only the compute‑storage‑separated architecture.

When enabling disaster recovery on an existing cluster, old FE metadata and CN data must be cleaned to avoid conflicts.

In case of snapshot creation failures, multiple snapshot directories may appear in S3; the older one should be used for restoration.

Automated snapshots retain a single latest snapshot; manual retention policies must be applied if longer history is required.

This guide provides a complete workflow—from configuring Helm charts and snapshot settings to executing recovery and validating results—helping users achieve high availability and data safety for StarRocks deployments.

KubernetesStarRocksdisaster recoveryBackupS3Helmcluster-snapshot
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.