Advanced Doris Features: Rollup, Join Types, Colocation, Dynamic Partition, Bitmap, and Materialized Views
This article explains Doris database enhancements including Rollup indexes, Broadcast/Shuffle/Colocation joins, dynamic partition management, bitmap aggregation, and materialized view creation, providing syntax examples and operational principles for each feature.
1.1 Add Rollup
Rollup is a materialized index structure that stores data separately and can reorder columns to improve prefix index hit rates or reduce key columns for higher aggregation.
Example schema of table1 and how to add a rollup that aggregates pv by citycode:
+----------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-------+---------+-------+
| siteid | int(11) | No | true | 10 | |
| citycode | smallint(6) | No | true | N/A | |
| username | varchar(32) | No | true | | |
| pv | bigint(20) | No | false | 0 | SUM |
| uv | bigint(20) | No | false | 0 | SUM |
+----------+-------------+------+-------+---------+-------+ ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv);1.2 Broadcast/Shuffle Join
Doris defaults to a Broadcast Join, broadcasting a filtered small table to all nodes of the large table to build an in‑memory hash table; if the small table cannot fit in memory, the join fails.
When memory limits are hit, explicitly specify a Shuffle (Partitioned) Join, which hashes both tables on the join key and distributes the work across the cluster.
Examples:
mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2; mysql> select sum(table1.pv) from table1 join [broadcast] table2 where table1.siteid = 2; mysql> select sum(table1.pv) from table1 join [shuffle] table2 where table1.siteid = 2;1.3 Colocation Join
1.3.1 Terminology
FE: Frontend node, handles metadata and request entry.
BE: Backend node, executes queries and stores data.
Colocation Group (CG): A group containing one or more tables that share the same colocation schema and data shard distribution.
Colocation Group Schema (CGS): Describes the common schema for tables in a CG, including bucket columns, bucket count, and replica count.
1.3.2 Principle
Colocation Join groups tables with identical CGS so that their shards reside on the same BE node, enabling local hash joins without network data transfer.
All tables in the same CG must have identical bucket columns, bucket count, and replica count.
1.3.3 Example
CREATE TABLE `tbl1` (
`k1` date NOT NULL COMMENT "",
`k2` int(11) NOT NULL COMMENT "",
`v1` int(11) SUM NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`, `k2`)
PARTITION BY RANGE(`k1`)(
PARTITION p1 VALUES LESS THAN ('2019-05-31'),
PARTITION p2 VALUES LESS THAN ('2019-06-30')
)
DISTRIBUTED BY HASH(`k2`) BUCKETS 8
PROPERTIES (
"colocate_with" = "group1"
); CREATE TABLE `tbl2` (
`k1` datetime NOT NULL COMMENT "",
`k2` int(11) NOT NULL COMMENT "",
`v1` double SUM NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`, `k2`)
DISTRIBUTED BY HASH(`k2`) BUCKETS 8
PROPERTIES (
"colocate_with" = "group1"
);When the colocation join is effective, the query plan shows colocate: true:
DESC SELECT * FROM tbl1 INNER JOIN tbl2 ON (tbl1.k2 = tbl2.k2);
... (plan fragment output) ...
| colocation: true |
...1.4 Dynamic Partition
1.4.1 Principle
Dynamic partitions automatically add or drop partitions based on a schedule configured by dynamic_partition_enable and dynamic_partition_check_interval_seconds in fe.conf.
1.4.2 Example
CREATE TABLE example_db.dynamic_partition (
k1 DATE,
k2 INT,
k3 SMALLINT,
v1 VARCHAR(2048),
v2 DATETIME DEFAULT "2014-02-04 15:36:00"
) ENGINE=olap
DUPLICATE KEY(k1, k2, k3)
PARTITION BY RANGE (k1) (
PARTITION p20200321 VALUES LESS THAN ("2020-03-22"),
PARTITION p20200322 VALUES LESS THAN ("2020-03-23"),
PARTITION p20200323 VALUES LESS THAN ("2020-03-24"),
PARTITION p20200324 VALUES LESS THAN ("2020-03-25")
)
DISTRIBUTED BY HASH(k2) BUCKETS 32
PROPERTIES(
"storage_medium" = "SSD",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-3",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32"
);The system deletes partitions older than the start window and creates future partitions up to the end window, respecting the naming prefix.
1.5 Bitmap Support
Doris uses Roaring Bitmap to reduce I/O, CPU, memory, and network usage during queries.
CREATE TABLE `pv_bitmap` (
`dt` int,
`page` varchar(10),
`user_id` bitmap bitmap_union
) AGGREGATE KEY(`dt`, page)
DISTRIBUTED BY HASH(`dt`) BUCKETS 2; select bitmap_count(bitmap_union(user_id)) from pv_bitmap;
select bitmap_union_count(user_id) from pv_bitmap;
select bitmap_union_int(id) from pv_bitmap;
-- BITMAP_UNION(expr): union of two bitmaps
-- BITMAP_COUNT(expr): cardinality of a bitmap
-- BITMAP_UNION_COUNT(expr): equivalent to BITMAP_COUNT(BITMAP_UNION(expr))
-- BITMAP_UNION_INT(expr): equivalent to COUNT(DISTINCT expr) for integer types1.6 Materialized View
Materialized views pre‑compute query results and store them as special tables, extending the capabilities of Rollup with richer aggregation functions.
create materialized view store_amt as
select store_id, sum(sale_amt) from sales_records group by store_id;Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
