Databases 12 min read

Doris Performance Optimization: OLAP Query, Indexes, Vectorized Execution, and High‑Concurrency Point Queries

This article explains how Apache Doris achieves high‑concurrency OLAP and point‑query performance through MPP architecture, columnar storage, partition‑bucket pruning, various indexes, materialized views, vectorized execution, runtime filters, short‑circuit planning, and prepared‑statement caching.

Big Data Technology & Architecture

Sep 18, 2024

OLAP Query

For high‑concurrency queries, the core is balancing limited system resources with the load from concurrent execution, aiming to minimize CPU, memory, and I/O per SQL by reducing data scans and subsequent computation.

2.1 MPP Architecture

Based on Massively Parallel Processing, Doris splits a query into many tasks that run in parallel across nodes, allowing linear scaling by adding compute resources.

2.2 Columnar Storage

Columnar format reads only the columns needed for a query, reducing disk I/O and memory load.

2.3 Partition and Bucket Pruning

Doris uses two‑level partitioning: Partition (often time‑based) and Bucket (hash‑based), improving read parallelism and query performance.

2.4 Indexes and Materialized Views

Doris supports various indexes (Short Key, Ordinal, ZoneMap, Bitmap, Bloom Filter) and materialized views to accelerate queries and reduce on‑the‑fly computation.

2.4.1 Indexes

Short Key Index creates a sparse index entry every 1024 rows on sorted columns, enabling fast row location when query predicates contain those columns.

Ordinal Index maps row numbers to the physical address of column data pages, acting as a primary index for other index types.

ZoneMap Index stores min/max statistics per segment and page, allowing Doris to skip irrelevant data blocks.

Bitmap Index is built for low‑cardinality columns; it records a map from key values to row IDs and uses Roaring bitmap encoding.

Bloom Filter Index is suitable for high‑cardinality columns and quickly determines if a value may exist, avoiding unnecessary file reads.

Example query:

select * from user_table where id > 10 and id < 1024;

2.4.2 Materialized Views

Materialized views pre‑compute results of defined SQL statements and store them as physical tables, reducing runtime computation for frequent aggregations.

2.5 Vectorized Query Execution

Doris processes a whole block of a column at once instead of row‑by‑row, improving CPU utilization.

2.6 Runtime Filter

During join processing, Doris builds a hash table on the build side and generates a runtime filter (e.g., min/max, IN) that is pushed down to the probe side, pruning data early and reducing network traffic.

Overall, these optimizations prune unnecessary data, lower I/O, CPU, and memory consumption, and boost concurrency.

High‑Concurrency Point Queries

Row Store Format – storing a row copy alongside columnar data for faster single‑row retrieval.

Row Cache – a cache for whole rows to improve point‑query hit rates.

Short‑Circuit Path – FE generates a lightweight plan for point queries, bypassing full MPP planning and scheduling.

Prepared Statements – FE caches parsed SQL and expressions in a session hash map, reducing CPU overhead for repeated queries.

Conclusion

Doris achieves high‑performance concurrent queries through MPP architecture, columnar storage, partition‑bucket pruning, vectorized engine, and various indexes and materialized views.

Additional features such as row store format, short‑circuit point‑query path, prepared statements, and row cache enable single‑node throughput of tens of thousands of QPS.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Query Optimization High Concurrency OLAP Indexes Vectorization doris

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.