Apache Doris 2.0-beta Release: New Query Optimizer, Pipeline Execution Engine, Workload Management and Major Performance Improvements
Apache Doris 2.0-beta, released on July 3, 2023, introduces a new Cascades‑based query optimizer, adaptive pipeline execution engine, workload‑aware resource isolation, enhanced memory management, partial column updates, multi‑catalog support, and numerous performance gains across real‑time analytics, ETL, and high‑concurrency point queries.
Apache Doris 2.0-beta was officially released on July 3, 2023, with over 255 contributors delivering more than 3500 optimizations and fixes.
Download links: https://doris.apache.org/download and source code at https://github.com/apache/doris/tree/branch-2.0 .
The roadmap emphasizes building a unified data‑analysis service that supports both online and offline workloads, high‑throughput interactive queries, and seamless analysis of semi‑structured and unstructured data, aiming to reduce complexity and operational cost.
Key technical challenges include guaranteeing stable high‑frequency writes, handling schema changes, supporting mixed query loads, efficient SQL execution, and resource isolation.
New Query Optimizer : A Cascades‑based optimizer with richer statistics and adaptive tuning provides >10× blind‑test performance on TPC‑H without manual tuning and fully supports TPC‑DS.
Pipeline Execution Engine : Replaces the volcano model with a data‑driven pipeline that decouples blocking operators, enables asynchronous execution, adaptive thread scheduling, and improves mixed‑load performance.
Workload Management : Introduces workload groups with configurable CPU share, memory limits, concurrency, queue size, and timeout, allowing fine‑grained isolation of CPU and memory per query.
Configuration examples:
create workload group if not exists etl_group
properties (
"cpu_share"="10",
"memory_limit"="30%",
"max_concurrency"="10",
"max_queue_size"="20",
"queue_timeout"="3000"
);Enable the new optimizer and pipeline engine by default:
SET enable_nereids_planner=true SET enable_pipeline_engine=trueMemory management has been overhauled with unified structures, soft limits, and GC mechanisms, eliminating most OOM‑related failures.
Import performance is boosted up to 200 % for Stream Load and up to 150 % for INSERT‑SELECT on TPC‑H datasets.
Partial column updates are now supported; example:
mysql> desc user_profile;
+-------------------+-------------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------------+------+-------+---------+-------+
| id | INT | YES | true | NULL | |
| name | VARCHAR(10) | YES | false | NULL | NONE |
| age | INT | YES | false | NULL | NONE |
| city | VARCHAR(10) | YES | false | NULL | NONE |
| balance | DECIMALV3(9,0) | YES | false | NULL | NONE |
| last_access_time | DATETIME | YES | false | NULL | NONE |
+-------------------+-------------------+------+-------+---------+-------+Load updated columns via Stream Load:
curl --location-trusted -u root: \
-H "partial_columns:true" \
-H "column_separator:," \
-H "columns:id,balance,last_access_time" \
-T /tmp/test.csv \
http://127.0.0.1:48037/api/db1/user_profile/_stream_loadAdditional enhancements include multi‑catalog support (Hudi, JDBC, Iceberg), new Map/Struct data types, lakehouse file caching, improved ORC/Parquet reading, row‑column hybrid storage with point‑query short‑path, and cross‑cluster data replication (CCR).
Upgrade notes: 1.2‑LTS can roll to 2.0‑beta, optimizer is enabled by default, vectorized engine removed, new parameters such as enable_single_replica_compaction added, and FQDN support for Kubernetes deployments.
For full details, refer to the official documentation links provided throughout the announcement.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.