Databases 25 min read

Apache Doris 2.0-beta Release: New Query Optimizer, Pipeline Execution Engine, Workload Management and Major Performance Improvements

Apache Doris 2.0-beta, released on July 3, 2023, introduces a new Cascades‑based query optimizer, adaptive pipeline execution engine, workload‑aware resource isolation, enhanced memory management, partial column updates, multi‑catalog support, and numerous performance gains across real‑time analytics, ETL, and high‑concurrency point queries.

DataFunTalk

Jul 7, 2023

Apache Doris 2.0-beta Release: New Query Optimizer, Pipeline Execution Engine, Workload Management and Major Performance Improvements

Apache Doris 2.0-beta was officially released on July 3, 2023, with over 255 contributors delivering more than 3500 optimizations and fixes.

Download links: https://doris.apache.org/download and source code at https://github.com/apache/doris/tree/branch-2.0 .

The roadmap emphasizes building a unified data‑analysis service that supports both online and offline workloads, high‑throughput interactive queries, and seamless analysis of semi‑structured and unstructured data, aiming to reduce complexity and operational cost.

Key technical challenges include guaranteeing stable high‑frequency writes, handling schema changes, supporting mixed query loads, efficient SQL execution, and resource isolation.

New Query Optimizer : A Cascades‑based optimizer with richer statistics and adaptive tuning provides >10× blind‑test performance on TPC‑H without manual tuning and fully supports TPC‑DS.

Pipeline Execution Engine : Replaces the volcano model with a data‑driven pipeline that decouples blocking operators, enables asynchronous execution, adaptive thread scheduling, and improves mixed‑load performance.

Workload Management : Introduces workload groups with configurable CPU share, memory limits, concurrency, queue size, and timeout, allowing fine‑grained isolation of CPU and memory per query.

Configuration examples:

create workload group if not exists etl_group
properties (
    "cpu_share"="10",
    "memory_limit"="30%",
    "max_concurrency"="10",
    "max_queue_size"="20",
    "queue_timeout"="3000"
);

Enable the new optimizer and pipeline engine by default:

SET enable_nereids_planner=true

SET enable_pipeline_engine=true

Memory management has been overhauled with unified structures, soft limits, and GC mechanisms, eliminating most OOM‑related failures.

Import performance is boosted up to 200 % for Stream Load and up to 150 % for INSERT‑SELECT on TPC‑H datasets.

Partial column updates are now supported; example:

mysql> desc user_profile;
+-------------------+-------------------+------+-------+---------+-------+
| Field             | Type              | Null | Key   | Default | Extra |
+-------------------+-------------------+------+-------+---------+-------+
| id                | INT               | YES  | true  | NULL    |       |
| name              | VARCHAR(10)       | YES  | false | NULL    | NONE  |
| age               | INT               | YES  | false | NULL    | NONE  |
| city              | VARCHAR(10)       | YES  | false | NULL    | NONE  |
| balance           | DECIMALV3(9,0)    | YES  | false | NULL    | NONE  |
| last_access_time  | DATETIME          | YES  | false | NULL    | NONE  |
+-------------------+-------------------+------+-------+---------+-------+

Load updated columns via Stream Load:

curl --location-trusted -u root: \
-H "partial_columns:true" \
-H "column_separator:," \
-H "columns:id,balance,last_access_time" \
-T /tmp/test.csv \
http://127.0.0.1:48037/api/db1/user_profile/_stream_load

Additional enhancements include multi‑catalog support (Hudi, JDBC, Iceberg), new Map/Struct data types, lakehouse file caching, improved ORC/Parquet reading, row‑column hybrid storage with point‑query short‑path, and cross‑cluster data replication (CCR).

Upgrade notes: 1.2‑LTS can roll to 2.0‑beta, optimizer is enabled by default, vectorized engine removed, new parameters such as enable_single_replica_compaction added, and FQDN support for Kubernetes deployments.

For full details, refer to the official documentation links provided throughout the announcement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time analytics SQL Optimization Database Performance Apache Doris Pipeline Execution Workload Management

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.