Best Practices for Apache Doris Compaction in Production Environments
This article outlines practical production‑level optimizations for Apache Doris compaction, covering vertical, segment, and single‑replica compaction methods, compaction policies, concurrency controls, and data‑ingestion tuning to improve import speed and query performance in OLAP workloads.
This article presents a comprehensive guide to optimizing Apache Doris compaction in production, which is increasingly important for OLAP interview questions and real‑world deployments.
1. Doris Optimizations and New Version Features
Doris stores data using an LSM‑Tree‑like structure and continuously merges small files into larger, ordered files via compaction, handling deletions and updates. Adjusting compaction strategies can significantly boost import and query efficiency.
Vertical Compaction
Introduced in Doris 1.2.2, vertical compaction reduces memory usage and speeds up compaction by merging data column‑wise instead of row‑wise. enable_vertical_compaction = true – enables the feature. vertical_compaction_num_columns_per_group = 5 – sets the number of columns per group (default 5 provides a good balance). vertical_compaction_max_segment_size – configures the size of files after vertical compaction (default 268435456 bytes).
Segment Compaction
Targeted at large‑batch imports, segment compaction merges multiple segments within a single batch, reducing the final segment count and preventing the OLAP_ERR_TOO_MANY_SEGMENTS error. enable_segcompaction = true – enables segment compaction. segcompaction_batch_size – defines how many segments trigger a compaction (default 10, typically set between 10‑30).
Recommended usage scenarios include massive data imports that hit the segment limit, frequent small‑file generation, immediate post‑import queries, and high compaction pressure after bulk loads.
Do not enable when the import process already exhausts memory resources.
Single‑Replica Compaction
When enabled, only one replica performs compaction while others pull the merged files, saving CPU cycles proportional to the replica count.
Activate via the table property enable_single_replica_compaction = true, either at table creation or with
ALTER TABLE table_name SET("enable_single_replica_compaction" = "true").
Compaction Policies
The compaction policy determines when small files are merged. Doris offers two policies:
size_based (default)
"compaction_policy" = "size_based"time_series
Optimized for log or time‑series data, merging files based on time locality. "compaction_policy" = "time_series" Time‑series compaction triggers when any of the following conditions are met:
Unmerged file size exceeds time_series_compaction_goal_size_mbytes (default 1 GB).
Unmerged file count exceeds time_series_compaction_file_count_threshold (default 2000).
Time since last compaction exceeds time_series_compaction_time_threshold_seconds (default 1 hour).
2. Compaction Concurrency Control
Compaction consumes CPU and I/O; concurrency can be tuned via BE configuration: max_base_compaction_threads – base compaction threads (default 4). max_cumu_compaction_threads – cumulative compaction threads (default 10). max_single_replica_compaction_threads – threads for pulling files in single‑replica mode (default 10).
3. Data‑Production Side Optimizations
Improving the data‑ingestion side is crucial; high commit frequency can cause write amplification and compaction lag.
Key parameters to adjust: sink.buffer-flush.max-bytes – maximum bytes per batch (default 10 MB, recommend >100 MB in production). sink.buffer-flush.interval – asynchronous flush interval (default 10 s, recommend >60 s).
By increasing batch size and flush interval, the number of commits is reduced, easing pressure on background compaction.
These guidelines provide a solid framework for answering interview questions on Doris compaction and for tuning Doris clusters in real‑world environments.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
