Databases 13 min read

Advanced Doris Features: Rollup, Join Types, Colocation, Dynamic Partition, Bitmap, and Materialized Views

This article explains Doris database enhancements including Rollup indexes, Broadcast/Shuffle/Colocation joins, dynamic partition management, bitmap aggregation, and materialized view creation, providing syntax examples and operational principles for each feature.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Advanced Doris Features: Rollup, Join Types, Colocation, Dynamic Partition, Bitmap, and Materialized Views

1.1 Add Rollup

Rollup is a materialized index structure that stores data separately and can reorder columns to improve prefix index hit rates or reduce key columns for higher aggregation.

Example schema of table1 and how to add a rollup that aggregates pv by citycode:

+----------+-------------+------+-------+---------+-------+
| Field    | Type        | Null | Key   | Default | Extra |
+----------+-------------+------+-------+---------+-------+
| siteid   | int(11)     | No   | true  | 10      |       |
| citycode | smallint(6) | No   | true  | N/A     |       |
| username | varchar(32) | No   | true  |         |       |
| pv       | bigint(20)  | No   | false | 0       | SUM   |
| uv       | bigint(20)  | No   | false | 0       | SUM   |
+----------+-------------+------+-------+---------+-------+
ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv);

1.2 Broadcast/Shuffle Join

Doris defaults to a Broadcast Join, broadcasting a filtered small table to all nodes of the large table to build an in‑memory hash table; if the small table cannot fit in memory, the join fails.

When memory limits are hit, explicitly specify a Shuffle (Partitioned) Join, which hashes both tables on the join key and distributes the work across the cluster.

Examples:

mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2;
mysql> select sum(table1.pv) from table1 join [broadcast] table2 where table1.siteid = 2;
mysql> select sum(table1.pv) from table1 join [shuffle] table2 where table1.siteid = 2;

1.3 Colocation Join

1.3.1 Terminology

FE: Frontend node, handles metadata and request entry.

BE: Backend node, executes queries and stores data.

Colocation Group (CG): A group containing one or more tables that share the same colocation schema and data shard distribution.

Colocation Group Schema (CGS): Describes the common schema for tables in a CG, including bucket columns, bucket count, and replica count.

1.3.2 Principle

Colocation Join groups tables with identical CGS so that their shards reside on the same BE node, enabling local hash joins without network data transfer.

All tables in the same CG must have identical bucket columns, bucket count, and replica count.

1.3.3 Example

CREATE TABLE `tbl1` (
    `k1` date NOT NULL COMMENT "",
    `k2` int(11) NOT NULL COMMENT "",
    `v1` int(11) SUM NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`, `k2`)
PARTITION BY RANGE(`k1`)(
    PARTITION p1 VALUES LESS THAN ('2019-05-31'),
    PARTITION p2 VALUES LESS THAN ('2019-06-30')
)
DISTRIBUTED BY HASH(`k2`) BUCKETS 8
PROPERTIES (
    "colocate_with" = "group1"
);
CREATE TABLE `tbl2` (
    `k1` datetime NOT NULL COMMENT "",
    `k2` int(11) NOT NULL COMMENT "",
    `v1` double SUM NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`, `k2`)
DISTRIBUTED BY HASH(`k2`) BUCKETS 8
PROPERTIES (
    "colocate_with" = "group1"
);

When the colocation join is effective, the query plan shows colocate: true:

DESC SELECT * FROM tbl1 INNER JOIN tbl2 ON (tbl1.k2 = tbl2.k2);
... (plan fragment output) ...
|  colocation: true |
...

1.4 Dynamic Partition

1.4.1 Principle

Dynamic partitions automatically add or drop partitions based on a schedule configured by dynamic_partition_enable and dynamic_partition_check_interval_seconds in fe.conf.

1.4.2 Example

CREATE TABLE example_db.dynamic_partition (
    k1 DATE,
    k2 INT,
    k3 SMALLINT,
    v1 VARCHAR(2048),
    v2 DATETIME DEFAULT "2014-02-04 15:36:00"
) ENGINE=olap
DUPLICATE KEY(k1, k2, k3)
PARTITION BY RANGE (k1) (
    PARTITION p20200321 VALUES LESS THAN ("2020-03-22"),
    PARTITION p20200322 VALUES LESS THAN ("2020-03-23"),
    PARTITION p20200323 VALUES LESS THAN ("2020-03-24"),
    PARTITION p20200324 VALUES LESS THAN ("2020-03-25")
)
DISTRIBUTED BY HASH(k2) BUCKETS 32
PROPERTIES(
    "storage_medium" = "SSD",
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-3",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
);

The system deletes partitions older than the start window and creates future partitions up to the end window, respecting the naming prefix.

1.5 Bitmap Support

Doris uses Roaring Bitmap to reduce I/O, CPU, memory, and network usage during queries.

CREATE TABLE `pv_bitmap` (
    `dt` int,
    `page` varchar(10),
    `user_id` bitmap bitmap_union
) AGGREGATE KEY(`dt`, page)
DISTRIBUTED BY HASH(`dt`) BUCKETS 2;
select bitmap_count(bitmap_union(user_id)) from pv_bitmap;
select bitmap_union_count(user_id) from pv_bitmap;
select bitmap_union_int(id) from pv_bitmap;
-- BITMAP_UNION(expr): union of two bitmaps
-- BITMAP_COUNT(expr): cardinality of a bitmap
-- BITMAP_UNION_COUNT(expr): equivalent to BITMAP_COUNT(BITMAP_UNION(expr))
-- BITMAP_UNION_INT(expr): equivalent to COUNT(DISTINCT expr) for integer types

1.6 Materialized View

Materialized views pre‑compute query results and store them as special tables, extending the capabilities of Rollup with richer aggregation functions.

create materialized view store_amt as
select store_id, sum(sale_amt) from sales_records group by store_id;
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Bitmapdynamic partitionJOINRollupdorisColocation
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.