Apache Doris 2.0‑beta Release: New Query Optimizer, Pipeline Engine, Workload Management and Performance Enhancements
Apache Doris 2.0‑beta, released on July 3, 2023, introduces a modern Cascades‑based query optimizer, a data‑driven pipeline execution engine, fine‑grained workload groups, enhanced memory management, partial‑column updates, compute nodes, cold‑hot tiering and cross‑cluster replication, delivering up to tenfold speedups and significant cost reductions for real‑time analytics.
Apache Doris 2.0‑beta was officially released on July 3, 2023, with over 255 contributors delivering more than 3,500 improvements.
Download links and source code are provided:
Download: https://doris.apache.org/download GitHub source: https://github.com/apache/doris/tree/branch-2.0
The release follows the 2023 roadmap aiming to support unified online and offline analytics, lake‑warehouse integration, and multi‑modal data processing while reducing system complexity for users.
Key technical innovations include a brand‑new query optimizer based on the Cascades framework (Nereids) that delivers over 10× speedup on TPC‑H benchmarks without manual tuning, and a modern pipeline execution engine that makes query execution data‑driven, improves resource isolation, and boosts mixed‑load performance.
Workload groups enable fine‑grained CPU and memory quotas, query queuing, and automatic resource reclamation, with configuration examples such as:
create workload group if not exists etl_group properties ("cpu_share"="10","memory_limit"="30%","max_concurrency"="10","max_queue_size"="20","queue_timeout"="3000");Memory management has been overhauled with a new MemTracker, soft limits, and GC mechanisms, eliminating most OOM incidents.
Partial‑column updates are now supported for Unique‑Key tables; an example using Stream Load:
curl --location-trusted -u root: -H "partial_columns:true" -H "column_separator:," -H "columns:id,balance,last_access_time" -T /tmp/test.csv http://127.0.0.1:48037/api/db1/user_profile/_stream_loadAdditional enhancements include compute nodes for cloud‑native lakehouse queries, cold‑hot data tiering that reduces storage cost by up to 70 %, new Map/Struct data types, row‑column hybrid storage with point‑query shortcuts, and a CCR cross‑cluster replication feature.
Upgrade notes: the Nereids planner is enabled by default (enable_nereids_planner=true), the vectorized engine flag has been removed, and several configuration defaults have changed (e.g., enable_single_replica_compaction, default use of datev2/datetimev2/decimalv3).
Extensive documentation and links are provided for download, configuration, and detailed feature guides.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.