Databases 21 min read

How StarRocks Compaction Boosts Query Performance: Mechanics, Tuning, and Best Practices

This article explains StarRocks' compaction process that merges multiple data versions into larger files to reduce I/O, details the scheduler and executor roles, shows how to monitor and control compaction via SQL commands, and provides tuning parameters and best‑practice recommendations for optimal performance.

StarRocks

Jun 18, 2024

StarRocks creates a new data version for each load operation; to obtain correct query results it must merge all versions, but accumulating many small files degrades query efficiency. Compaction periodically merges historical versions into larger files, eliminating duplicate records and improving performance.

Data Versioning and Compaction Basics

Each import generates a new version on the Frontend (FE) and marks it on a partition. A partition may contain multiple tablets that share the same version number; when a transaction commits, all tablets in the partition increment their visible version.

Compaction creates a new version by merging small files from previous versions. After compaction, the original files can be safely deleted because their data is now present in the merged file.

Compaction Scheduler and Executor

In StarRocks' compute‑storage separation architecture, the FE acts as the Compaction Scheduler, while BE or CN nodes act as Compaction Executors. The scheduler runs a periodic thread that selects partitions with the highest Compaction Score (a metric indicating the urgency of merging) and constructs Compaction Tasks.

For each selected partition, the scheduler gathers all tablets, groups them by the compute node (CN) they reside on, and creates a task containing the tablet list for that CN. The tasks are then sent to the respective CNs, where a dedicated thread pool executes them.

Compaction Score

The FE maintains a Compaction Score for each partition, calculated from the scores of its tablets. Higher scores indicate more urgent need for compaction. The scheduler only initiates tasks for partitions whose score exceeds a configurable threshold.

Viewing Compaction Score

Run the following command on the Leader FE to see the score of each partition:

MySQL [(none)]> show proc '/dbs/load_benchmark/store_sales/partitions';

The output includes columns such as AvgCS (average score) and MaxCS (maximum score) for the partition.

Viewing Compaction Tasks

To list all ongoing compaction tasks: MySQL [(none)]> show proc '/compactions'; Each row shows the partition, transaction ID, start/commit/finish times, and any error information.

For detailed progress of sub‑tasks on a specific transaction ID, query:

MySQL [(none)]> select * from information_schema.be_cloud_native_compactions where TXN_ID = 197562;

The PROGRESS column indicates the percentage completed, and STATUS shows the task state or error details.

Canceling a Compaction Task

To cancel a specific compaction task (executed on the Leader FE):

CANCEL COMPACTION WHERE TXN_ID = 123;

Parameter Tuning

FE Parameters

lake_compaction_score_selector_min_score = 10.0

– partitions with a score below this value will not trigger compaction. lake_compaction_max_tasks = -1 – maximum number of compaction tasks FE can launch simultaneously; -1 lets FE auto‑calculate based on BE count. lake_compaction_history_size = 12 and lake_compaction_fail_history_size = 12 – control how many recent successful or failed compaction records are retained.

These parameters can be modified at runtime, e.g.:

admin set frontend config ("lake_compaction_max_tasks" = "0");

BE / CN Parameters

compact_threads = 4

– number of threads on each BE/CN that can execute compaction tasks concurrently. compact_thread_pool_queue_size = 100 – maximum number of pending compaction tasks the BE can accept. max_cumulative_compaction_num_singleton_deltas = 100 – maximum number of small files merged in a single compaction; reducing this value can make tasks finish faster with lower resource usage.

BE parameters can also be changed dynamically, for example:

mysql> update information_schema.be_configs set value = 8 where name = "compact_threads";

Best Practices

Monitor Compaction Score and set alerts; StarRocks provides Grafana dashboards for this metric.

Watch resource consumption, especially memory, during compaction; Grafana templates also expose these metrics.

When the cluster is idle, increase the parallel compaction thread count on compute nodes to accelerate merging.

For detailed monitoring and alerting, refer to the Prometheus and Grafana integration documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL compaction StarRocks performance tuning Data Management

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.