Databases 15 min read

StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices

The article outlines a cross‑cluster upgrade to StarRocks 2.5.13, evaluating resource and stability costs, and presents two migration schemes—using external tables and a Flink connector—along with planning, parallel execution, validation steps, and results showing successful migration of over 10 TB at 2 Gb/s across ten nodes, while noting future automation and CDC enhancements.

DeWu Technology
DeWu Technology
DeWu Technology
StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices

Background: Before 2024, many StarRocks clusters were on low versions with stability issues, prompting an upgrade to 2.5.13.

Evaluation: The migration cost is divided into resource cost (person‑hours, hardware, bandwidth) and stability cost (impact on online services).

Solution Process

1. Feasibility Evaluation

Assess migration from the perspective of resource consumption and system stability.

2. Design

Two practical schemes are presented.

Scheme 1 – StarRocks External Table

Technical principle: Since version 1.19 StarRocks supports writing to an external table that points to another cluster. Create a matching table in the target cluster, then create an external table in the source cluster and INSERT data.

Advantages: simple data sync, read/write separation.

Code example:

CREATE EXTERNAL TABLE external_db.external_t (
    k1 DATE,
    k2 INT,
    k3 SMALLINT,
    k4 VARCHAR(2048),
    k5 DATETIME
) ENGINE=olap
DUPLICATE KEY(`timestamp`)
PARTITION BY RANGE(`timestamp`) (
    PARTITION p20231016 VALUES [("2023-10-16 00:00:00"), ("2023-10-17 00:00:00")],
    PARTITION p20231017 VALUES [("2023-10-17 00:00:00"), ("2023-10-18 00:00:00")]
)
DISTRIBUTED BY HASH(k1) BUCKETS 10
PROPERTIES (
    "host" = "127.0.0.x",
    "port" = "9020",
    "user" = "${user}",
    "password" = "${passwd}",
    "database" = "test_db",
    "table" = "t"
);

Insert data:

INSERT INTO external_db.external_t SELECT * FROM db.other_table;

Scheme 2 – Flink Connector

Technical principle: Flink provides a unified stream‑batch engine. The StarRocks Flink connector enables both reading and writing.

Key steps: define source table, define sink table, write ETL SQL.

Source table example:

CREATE TABLE rule_script_etl_source (
  `timestamp` TIMESTAMP,
  `identity_id` STRING,
  `app` STRING,
  `cost` BIGINT,
  `name` STRING,
  `error` STRING,
  `script` STRING,
  `rule_id` STRING
) WITH (
  'connector'='du-starrocks-1.27',
  'jdbc-url'='jdbc:mysql://1.1.1.1:9030?useSSL=false&rewriteBatchedStatements=true',
  'scan-url'='1.1.1.1:8030',
  'user'='${user}',
  'password'='${passwd}',
  'database-name'='test_db',
  'table-name'='rule_script_etl',
  'scan.max-retries'='3',
  'scan.connect.timeout-ms'='600000',
  'scan.params.keep-alive-min'='1440',
  'scan.params.query-timeout-s'='86400',
  'scan.params.mem-limit-byte'='1073741824'
);

Sink table example (primary‑key model):

CREATE TABLE rule_script_etl_sink (
  `timestamp` TIMESTAMP,
  `identity_id` STRING,
  `app` STRING,
  `rule_id` STRING,
  `uid` BIGINT,
  `cost` BIGINT,
  `name` STRING,
  `error` BIGINT,
  `script` STRING,
  `sink_time` TIMESTAMP,
  PRIMARY KEY(`identity_id`) NOT ENFORCED
) WITH (
  'connector'='du-starrocks-1.27',
  'jdbc-url'='jdbc:mysql://1.1.1.2:9030?useSSL=false&rewriteBatchedStatements=true',
  'load-url'='1.1.1.2:8030',
  'user'='${user}',
  'password'='${passwd}',
  'database-name'='test_db',
  'table-name'='rule_script_etl',
  'sink.buffer-flush.max-rows'='400000',
  'sink.buffer-flush.max-bytes'='94371840',
  'sink.buffer-flush.interval-ms'='30000',
  'sink.connect.timeout-ms'='60000',
  'sink.wait-for-continue.timeout-ms'='60000',
  'sink.properties.column_separator'='#=#',
  'sink.properties.row_delimiter'='@=@'
);

ETL example:

INSERT INTO rule_script_etl_sink
SELECT
  `timestamp`,
  `identity_id`,
  `app`,
  `rule_id`,
  CAST(NULL AS BIGINT) `uid`,
  `cost`,
  `name`,
  CAST(`error` AS BIGINT) `error`,
  `script`,
  `timestamp` AS `sink_time`
FROM rule_script_etl_source
WHERE `timestamp` >= '2023-08-20 00:00:00' AND `timestamp` < '2023-09-20 00:00:00';

Planning

Calculate expected maximum sync rate: total data (MB) / total days / daily sync hours. Two methods are provided to estimate total sync time.

Implementation

Set appropriate parallelism (e.g., SlotNum/TM=1, Parallelism=3) and split large tables into partitions for incremental migration.

Validation & Acceptance

Validate cluster load (keep water‑mark < 80%), perform data‑row count checks, and data‑quality checks (SUM/MAX/MIN/AVG on fact tables, partition‑level checks on dimension tables).

Results

The approach successfully upgraded multiple clusters to 2.5.13, migrated over 10 TB of data with peak throughput of 2 Gb/s across more than 10 nodes, and avoided instability caused by in‑place upgrades.

Outlook & Limitations

Automation of task analysis, splitting, scheduling, and fault‑tolerance can be improved. Real‑time CDC is still missing; current solution relies on offline sync.

Future Plans

Package atomic capabilities (schema sync, table rebuild, etc.) for broader use, integrate CDC for seamless upgrades, and explore cross‑cluster migration for scenarios such as cluster splitting or merging.

data migrationFlinkSQLcluster upgradeExternal TableStarRocks
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.