Databases 25 min read

TXSQL Query Optimizer Framework: Transformation, Join Reorder, and Cost Model

This article introduces the TXSQL query optimizer built on MySQL 8.0.22, detailing its cascades‑style framework, transformation rewrite rules such as outer‑join elimination and subquery flattening, join‑order heuristics, cost‑model configuration, and execution strategies, providing a comprehensive overview of its design and enhancements.

Tencent Database Technology

May 31, 2021

TXSQL Query Optimizer Framework: Transformation, Join Reorder, and Cost Model

First Part – Query Optimizer Framework

Relational databases provide a general‑purpose system where SQL, as a declarative language, lets users describe *what* to do without specifying *how*; the optimizer must therefore select an efficient execution plan for each query.

Typical optimizers use a cost‑based model that performs SQL‑shape transformations, chooses access paths, and determines join order within a unified framework. Many databases adopt the Cascades framework; MySQL’s official documentation does not name its framework, but its process resembles a two‑stage model of shape changes followed by cost evaluation.

TXSQL inherits MySQL’s optimizer framework. This article series uses MySQL 8.0.22 as a baseline and briefly explains several core mechanisms of the MySQL optimizer.

1.1 Main Workflow

As shown in Figure 1‑1, after receiving an SQL statement, TXSQL parses it into an abstract syntax tree (AST). The optimizer then performs basic transformations on the AST, such as expression preprocessing and rewriting of the SELECT_LEX structure. After these basic changes, a cost‑based optimizer selects access paths and the optimal multi‑table join order. When the optimizer believes it has found the best plan, the logical plan is converted into a physical plan for execution.

In classic database optimizers this process is iterative and converges after multiple cost evaluations. MySQL, however, performs the optimization only once, which sacrifices some accuracy for faster response in cloud environments.

1.2 Main Data Structures

MySQL 8.0’s server layer contains several important data structures that describe the logical flow from parsing to execution. (The original diagram is retained below.)

Second Part – Transformation Algorithms

Transformation (or rewrite) algorithms convert a logical execution plan into a semantically equivalent form that is more efficient. MySQL already implements rewrites such as subquery flattening, outer‑join elimination, derived‑condition push‑down, predicate conversion, and materialized subqueries.

Rewrite algorithms are generally classified as rule‑based or cost‑based; MySQL currently relies mainly on rule‑based rewrites. Three basic rewrite algorithms are described below.

2.1 Outer‑Join Elimination

This rewrite converts an outer join into an inner join when a null‑rejecting condition is present, expanding the join order search space. The rewrite is triggered if at least one of the following holds:

the WHERE clause contains a null‑rejecting condition on the inner table, or

the outer join is nested inside another join that already has a null‑rejecting condition on the inner table.

Example:

select * from t1 left join t2 on t1.a = t2.a left join t3 on t2.a = t3.a where t3.b = 1;

The condition t3.b = 1 rejects rows where t3.b is NULL, making the left join on t3 equivalent to an inner join. Likewise, the join condition t2.a = t3.a is a null‑rejecting condition for the preceding t1 LEFT JOIN t2, so the whole plan can be rewritten to use inner joins only.

The resulting execution plan is:

-> Inner hash join (t3.a = t1.a) (cost=1.05 rows=1)</code>
    <code>   -> Filter: (t3.b = 1) (cost=0.35 rows=1)</code>
    <code>   -> Table scan on t3 (cost=0.35 rows=1)</code>
    <code>   -> Hash</code>
    <code>      -> Inner hash join (t2.a = t1.a) (cost=0.70 rows=1)</code>
    <code>         -> Table scan on t2 (cost=0.35 rows=1)</code>
    <code>         -> Hash</code>
    <code>            -> Table scan on t1 (cost=0.35 rows=1)

2.2 Subquery Flattening

MySQL transforms IN/EXISTS subqueries into semi‑joins or anti‑joins, pulling the subquery’s tables into the outer query block to improve performance.

Form 1 (IN subquery):

select t1.a from t1 where t1.a in (select t2.a from t2);</code>
  <code>→ select t1.a from t1 semi join t2 on t1.a = t2.a;

Form 2 (EXISTS subquery):

select t1.a from t1 where exists (select 1 from t2 where t1.a = t2.a);</code>
  <code>→ select t1.a from t1 semi join t2 on t1.a = t2.a;

Form 3 (NOT IN / NOT EXISTS) requires an additional NULL‑filter condition because of NULL semantics:

select t1.a from t1 where not exists (select 1 from t2 where t1.a = t2.a);</code>
  <code>→ select t1.a from t1 anti join t2 on t1.a = t2.a where t2.a is not null;

2.3 Derived‑Condition Push‑Down

This rewrite pushes external predicates into a subquery when doing so reduces the number of rows processed. The rule applies in three situations:

If the inner subquery has no aggregation or window functions, external predicates can be pushed down.

If the inner subquery has a GROUP BY but no window functions, and the external predicate does not reference a GROUP BY column, the predicate can be turned into a HAVING clause.

If the external predicate references a column that is also a GROUP BY column of the inner subquery, it can be pushed down directly.

MySQL 8.0.22 implements this algorithm for any of the above scenarios.

Third Part – Join Reorder Algorithm

For an N‑table join, the left‑deep rule alone yields roughly N! possible join orders (e.g., 10 tables → 3,628,800 orders). Enumerating all plans is infeasible. MySQL uses a greedy algorithm with pruning: each step selects the locally optimal join based on cost, while exploring a controlled breadth of alternatives.

t1 has 17 found rows; t2 has 4 found rows; t3 has 2 found rows; t4 has 3 found rows;</code>
    <code>SQL-1: select * from t2 inner join t1 on t1.a = t2.a inner join t3 on t1.a = t3.a inner join t4 on t3.a = t4.a;</code>
    <code>SQL-2: select * from t2 inner join t1 on t1.a = t2.a inner join t3 on t1.a = t3.a straight_join t4 on t3.a = t4.a;</code>
    <code>SQL-3: select /*+ join_order(t1,t2,t3,t4) */ * from t2, t1, t3, t4 where t1.a = t2.a and t1.a = t3.a and t3.a = t4.a;

3.1 Heuristic Pruning Rules

Two system variables control cost‑based pruning and search depth: optimizer_prune_level (default 1) enables pruning of branches whose cost exceeds the best cost at the same depth. optimizer_search_depth (default 62) limits the depth of the local‑optimal search; a smaller value reduces the search space for queries with many tables.

The optimizer first sorts tables heuristically by dependency, row count (smaller tables first), and appearance order. In the example, the order (t3, t4, t2, t1) is derived from row‑count sorting, and the greedy search with a depth of 2 yields the final join order (t3, t4, t2, t1).

3.2 Specifying Join Order

Users can force a join order via: STRAIGHT_JOIN, which forces the optimizer to follow the join order written in the query.

Join‑order hints (e.g., /*+ join_order(t1,t2,t3,t4) */) that bypass the optimizer’s search.

In the examples, SQL‑2 forces the order (t3, t2, t1, t4) with STRAIGHT_JOIN, while SQL‑3 forces (t1, t2, t3, t4) via a hint.

Fourth Part – Cost Model

4.1 Adjustable Weights

MySQL assigns a cost to each operator and aggregates these to evaluate whole plans. Since MySQL 5.7, administrators can adjust unit costs via two system tables, mysql.server_cost and mysql.engine_cost, allowing hardware‑specific tuning.

mysql> select * from mysql.server_cost;</code>
    <code>+------------------------------+------------+---------------------+---------+---------------+</code>
    <code>| cost_name                    | cost_value | last_update         | comment | default_value |</code>
    <code>+------------------------------+------------+---------------------+---------+---------------+</code>
    <code>| disk_temptable_create_cost   | NULL       | 2021-03-25 23:07:10 | NULL    | 20            |</code>
    <code>| disk_temptable_row_cost      | NULL       | 2021-03-25 23:07:10 | NULL    | 0.5           |</code>
    <code>| key_compare_cost             | NULL       | 2021-03-25 23:07:10 | NULL    | 0.05          |</code>
    <code>| memory_temptable_create_cost | NULL       | 2021-03-25 23:07:10 | NULL    | 1             |</code>
    <code>| memory_temptable_row_cost    | NULL       | 2021-03-25 23:07:10 | NULL    | 0.1           |</code>
    <code>| row_evaluate_cost            | NULL       | 2021-03-25 23:07:10 | NULL    | 0.1           |</code>
    <code>+------------------------------+------------+---------------------+---------+---------------+</code>
    <code>mysql> select * from mysql.engine_cost;</code>
    <code>+-------------+-------------+------------------------+------------+---------------------+---------+---------------+</code>
    <code>| engine_name | device_type | cost_name              | cost_value | last_update         | comment | default_value |</code>
    <code>+-------------+-------------+------------------------+------------+---------------------+---------+---------------+</code>
    <code>| default     | 0           | io_block_read_cost     | NULL       | 2021-03-25 23:07:10 | NULL    | 1             |</code>
    <code>| default     | 0           | memory_block_read_cost | NULL       | 2021-03-25 23:07:10 | NULL    | 0.25          |</code>
    <code>+-------------+-------------+------------------------+------------+---------------------+---------+---------------+

4.2 Viewing Costs

MySQL 8.0 switched to a Volcano execution framework, representing plans as trees. Using EXPLAIN FORMAT=tree makes plan inspection straightforward. Detailed cost tracing can be enabled with SET SESSION optimizer_trace='enabled=on' and then querying information_schema.optimizer_trace to see the optimizer’s decision process.

Fifth Part – Query Execution

In MySQL 5.7 the physical layer lacked explicit operator concepts, scattering logic across many modules. Starting with 8.0, a Volcano framework materializes physical operators such as Scan, Filter, Sort, Aggregate, and Join, decoupling optimization from execution.

Expressions are evaluated lazily: they are computed only after a blocking operator finishes or when results are emitted, reducing unnecessary work.

5.1 Execution Framework

The Volcano model splits a plan into multiple stages based on whether an operator is blocking. For example, the following query is transformed into a binary tree of five physical operators, each stage highlighted with a different color:

SQL‑4: select t1.b, sum(t1.c) from t1 inner join t2 on t1.a = t2.a where t1.a = 100 group by t1.b

5.2 Execution Pre‑Move

The optimizer can pre‑compute “const tables” (tables with a single row or filtered by a unique key) during the optimization phase; a placeholder empty operator is still generated in the physical plan for consistency. Non‑correlated subqueries may also be materialized and cached, avoiding repeated evaluation for each outer row.

Sixth Part – Optimizer Enhancements

SQL Outline

Occasionally the optimizer picks a sub‑optimal plan due to inaccurate weight settings or skewed statistics. Changing server‑side cost parameters is time‑consuming, so TXSQL introduces an “outline” mechanism that forces a specific plan without altering the original SQL. By enabling cdb_opt_outline_enabled and binding USE INDEX(i1) to table t1, the server will follow the user‑specified plan.

Seventh Part – Summary

The article briefly introduced the main modules of the TXSQL optimizer, covering the framework, transformation rules, join‑order heuristics, cost model, execution architecture, and upcoming enhancements. Future posts will dive deeper into each component to help readers understand the inner workings of the TXSQL optimizer.

Tencent Database Technology Team – internal support for QQ Space, WeChat Red Packets, Tencent Ads, Tencent Music, Tencent News, and external support for Tencent Cloud products such as TDSQL‑C, TencentDB for MySQL, CTSDB, MongoDB, CES, etc. The public account aims to share professional database knowledge with enthusiasts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database mysql Transformation Cost Model Join Reorder Query Optimizer

Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.