Best Practices for Setting Buckets and Partitions in Apache Doris
This article explains how improper bucket and partition settings in Apache Doris can degrade read/write performance, provides quantitative guidelines for choosing bucket and partition counts, and introduces the automatic bucket feature with practical syntax and usage tips.
1. Issues Caused by Improper Bucket Settings
问题描述:After a period of operation, as data grows, the cluster becomes increasingly slow in read/write operations, eventually failing to read or write. 问题处理: Analyzed the schema of data warehouse tables and found many tables with small data but excessively large bucket numbers.
Used the show data from table command to list bucket information of all tables; most bucket settings were unreasonable.
Adjusted bucket numbers according to official recommendations, after which the cluster gradually recovered normal read/write performance.
2. Recommendations for Partition and Bucket Numbers
A table's total Tablet count equals Partition num * Bucket num.
Quantity principle: The recommended Tablet count (without considering future scaling) should be slightly larger than the total number of disks in the cluster.
Data size principle: Ideally each Tablet holds 1 GB–10 GB of data; too small harms aggregation and metadata management, too large impedes replica migration and increases the cost of schema changes or rollup retries.
If the two principles conflict, prioritize the data size principle.
When creating a table, the bucket number for each partition is fixed, but for dynamically added partitions ( ADD PARTITION) you can specify a different bucket count to adapt to data growth or shrinkage.
Because a partition's bucket count cannot be changed later, plan for future cluster expansion when setting it.
Examples: with 10 BE nodes each having one disk, a 500 MB table may use 4–8 shards; a 5 GB table 8–16 shards; a 50 GB table 32 shards; a 500 GB table should be partitioned with each partition around 50 GB and 16–32 shards; a 5 TB table follows similar partitioning.
3. Problems from Non‑Standard Bucket Numbers
3.1 Too Many Buckets
Excessive bucket counts increase FE metadata load, degrading import and query performance, especially as data volume grows over time.
3.2 Too Few Buckets
Insufficient buckets cause each Tablet to exceed the recommended 10 GB limit, leading to slow compaction and possible load failures (e.g., Broker Load).
4. Bucket Number Standards
A table's total Tablet count equals Partition num * Bucket num.
Quantity principle: Tablet count should be slightly larger than the cluster's disk count.
Data size principle: Aim for 1 GB–10 GB per Tablet; prioritize this over the quantity principle when they conflict.
When adding partitions, you can specify a new bucket count to handle data scaling.
Bucket count is immutable per partition; consider future expansion when setting it.
For continuously growing data, consider the following bucket recommendations (illustrated in the original image).
5. Automatic Bucket Feature
Manual bucket configuration requires accurate knowledge of current data size and future growth, which is not user‑friendly for non‑data engineers. Automatic bucket assignment simplifies this process (available only for partitioned tables) and requires Apache Doris 1.2.2 or later.
5.1 Table Creation Syntax
create table tbl1
(...)
[PARTITION BY RANGE(...)]
DISTRIBUTED BY HASH(k1) BUCKETS AUTO
properties(
"estimate_partition_size" = "10G"
) BUCKETS AUTOenables automatic bucket number calculation. estimate_partition_size (optional) provides an initial per‑partition data size estimate; if omitted, the default bucket count is 10.
The automatic bucket feature can also predict future bucket numbers based on historical partition data trends.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
