Databases 17 min read

Insights into the Design and Challenges of Doris' New Optimizer (Nereids)

The article explains why Doris needed a new optimizer, describes its architecture—including rule‑based and cost‑based stages, early data‑size reduction techniques, dynamic‑programming join‑reorder methods, and practical challenges such as statistics errors and runtime filters—while sharing performance results and a Q&A session.

DataFunTalk
DataFunTalk
DataFunTalk
Insights into the Design and Challenges of Doris' New Optimizer (Nereids)

All database users care about how the optimizer improves SQL performance, which is why Doris invested heavily in a new optimizer. The presentation is organized into four parts: reshaping the optimizer, the essence of optimization, performance bottlenecks, and challenges.

Reshaping highlights the shortcomings of the old optimizer: lack of abstract rule representation, missing cost‑based optimizer (CBO) framework, scattered cost‑model code, and difficulty observing rule impact on plans. These issues motivated the development of a new optimizer that integrates rule abstraction and a unified CBO.

The essence section explains that SQL is declarative; the optimizer decides the execution plan. The pipeline consists of parsing, logical rewrite (RBO), cost estimation (CBO), and plan translation. RBO applies rules such as predicate push‑down, expression rewrite, constant folding, and empty‑operator elimination, while CBO focuses on join order, CTE handling, and aggregation strategies.

Performance bottlenecks are addressed by reducing data size early. A TPC‑H example shows how filtering customers and suppliers by country before joining orders can cut data volume dramatically, yielding 2‑3× speedup. Join‑order optimization uses dynamic‑programming methods: Cascading (memo‑based) and DPhyper, which complement each other.

The challenges discussed include balancing fairness and efficiency in join‑reorder (Left‑Deep, ZigZag, Bushy trees), handling estimation errors from statistics (sampling, NDV, uniformity assumptions), and evaluating cost models with tools like qError and Plan Ranker. Runtime Filters are presented as a disruptive technique that can dramatically improve join performance by early filtering across tables, though they introduce uncertainty and latency.

The talk concludes with the upcoming Doris 2.0 release featuring the Nereids optimizer, an invitation to try it, and a brief Q&A covering topics such as CostAndEnforce, Runtime Filter behavior, pagination support, and the potential role of AI in future optimizer research.

Database PerformanceDorisJoin ReorderQuery OptimizerCost-Based OptimizationRule-Based OptimizationRuntime Filter
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.