Big Data 9 min read

Best Practices for Apache Doris Compaction in Production Environments

This article outlines practical production‑level optimizations for Apache Doris compaction, covering vertical, segment, and single‑replica compaction methods, compaction policies, concurrency controls, and data‑ingestion tuning to improve import speed and query performance in OLAP workloads.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Best Practices for Apache Doris Compaction in Production Environments

This article presents a comprehensive guide to optimizing Apache Doris compaction in production, which is increasingly important for OLAP interview questions and real‑world deployments.

1. Doris Optimizations and New Version Features

Doris stores data using an LSM‑Tree‑like structure and continuously merges small files into larger, ordered files via compaction, handling deletions and updates. Adjusting compaction strategies can significantly boost import and query efficiency.

Vertical Compaction

Introduced in Doris 1.2.2, vertical compaction reduces memory usage and speeds up compaction by merging data column‑wise instead of row‑wise. enable_vertical_compaction = true – enables the feature. vertical_compaction_num_columns_per_group = 5 – sets the number of columns per group (default 5 provides a good balance). vertical_compaction_max_segment_size – configures the size of files after vertical compaction (default 268435456 bytes).

Segment Compaction

Targeted at large‑batch imports, segment compaction merges multiple segments within a single batch, reducing the final segment count and preventing the OLAP_ERR_TOO_MANY_SEGMENTS error. enable_segcompaction = true – enables segment compaction. segcompaction_batch_size – defines how many segments trigger a compaction (default 10, typically set between 10‑30).

Recommended usage scenarios include massive data imports that hit the segment limit, frequent small‑file generation, immediate post‑import queries, and high compaction pressure after bulk loads.

Do not enable when the import process already exhausts memory resources.

Single‑Replica Compaction

When enabled, only one replica performs compaction while others pull the merged files, saving CPU cycles proportional to the replica count.

Activate via the table property enable_single_replica_compaction = true, either at table creation or with

ALTER TABLE table_name SET("enable_single_replica_compaction" = "true")

.

Compaction Policies

The compaction policy determines when small files are merged. Doris offers two policies:

size_based (default)

"compaction_policy" = "size_based"

time_series

Optimized for log or time‑series data, merging files based on time locality. "compaction_policy" = "time_series" Time‑series compaction triggers when any of the following conditions are met:

Unmerged file size exceeds time_series_compaction_goal_size_mbytes (default 1 GB).

Unmerged file count exceeds time_series_compaction_file_count_threshold (default 2000).

Time since last compaction exceeds time_series_compaction_time_threshold_seconds (default 1 hour).

2. Compaction Concurrency Control

Compaction consumes CPU and I/O; concurrency can be tuned via BE configuration: max_base_compaction_threads – base compaction threads (default 4). max_cumu_compaction_threads – cumulative compaction threads (default 10). max_single_replica_compaction_threads – threads for pulling files in single‑replica mode (default 10).

3. Data‑Production Side Optimizations

Improving the data‑ingestion side is crucial; high commit frequency can cause write amplification and compaction lag.

Key parameters to adjust: sink.buffer-flush.max-bytes – maximum bytes per batch (default 10 MB, recommend >100 MB in production). sink.buffer-flush.interval – asynchronous flush interval (default 10 s, recommend >60 s).

By increasing batch size and flush interval, the number of commits is reduced, easing pressure on background compaction.

These guidelines provide a solid framework for answering interview questions on Doris compaction and for tuning Doris clusters in real‑world environments.

big datacompactionPerformance TuningOLAPApache Doris
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.