Design and Implementation of ByteHouse Query Optimizer
The article explains how ByteHouse extends ClickHouse with a full‑featured query optimizer—including rule‑based and cost‑based techniques, analyzer modules, plan construction, and distributed optimization—to overcome ClickHouse limitations and achieve significant performance gains on complex OLAP workloads.
ByteHouse identified several limitations of ClickHouse, such as lack of upsert/delete, weak multi‑table joins, reduced availability at large scale, and no resource isolation, and decided to build a comprehensive data‑analysis platform by enhancing ClickHouse's capabilities.
In relational databases, the query optimizer is a core component; in OLAP scenarios it is even more critical because complex queries require efficient execution plans, and a good optimizer can prevent slow performance caused by poorly written SQL.
Optimizers generally fall into two categories: rule‑based optimization (RBO) and cost‑based optimization (CBO), and effective systems combine both approaches.
RBO applies a strict sequence of transformation rules to rewrite relational expressions, producing a deterministic execution plan regardless of data distribution, though SQL syntax variations can still affect the plan.
CBO generates multiple candidate plans, evaluates each using statistics and a cost model, and selects the plan with the lowest estimated cost.
ByteHouse's optimizer architecture consists of three main modules: Analyzers (including QueryRewriter and QueryAnalyzer), QueryPlan, and Optimizer.
The Analyzers module handles AST rewriting for both ANSI SQL and ClickHouse‑specific syntax, supporting CTE/view expansion, UDF expansion, function rewrites, and ensures semantic consistency; QueryAnalyzer then validates the rewritten AST.
QueryPlan builds an initial logical plan based on the analysis results, adds serialization capabilities, and focuses on relational‑algebra semantics rather than execution details.
The Optimizer module, anchored by the PlanOptimizer class, classifies queries using PlanPattern and applies three rewrite frameworks—visitor‑based, pattern‑match‑based, and cascade—to perform transformations such as predicate push‑down, join reordering, and cost‑based enumeration.
With the new optimizer, ByteHouse can run the full TPC‑DS benchmark (99 queries) whereas the original ClickHouse could only handle a fraction; performance improvements range from 6× to 10×, and the system now supports advanced RBO features (column/partition pruning, expression simplification, etc.) and CBO capabilities (join enumeration, histogram‑based costing, distributed plan optimization, dynamic filter push‑down, materialized view rewrites, and CTE sharing).
The article concludes by inviting readers to try ByteHouse for free and join technical communities for further discussion.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.