Inside MySQL 8.0 Optimizer: From SQL String to Execution Plan
This article walks through MySQL 8.0’s SQL optimizer, explaining how a raw SQL statement is parsed, transformed, and optimized through stages such as Setup, Resolve, Derived‑table merging, materialization, window‑function handling, and ROLLUP processing, illustrated with diagrams and code examples.
Background and Architecture
Programs implement application logic, but a database provides a fast way to retrieve data by executing SQL strings. Without a database, every data request would require custom program code, which is hard to maintain and not portable. The optimizer abstracts data access so users can write generic SQL without worrying about the underlying data interface.
What the SQL Optimizer Does
The optimizer is a highly abstracted data‑interface implementation that lets clients use familiar SQL to manipulate data while freeing their applications from low‑level data‑access details.
Setup and Resolve
setup_tables – set up table leaves in the query block.
resolve_placeholder_tables / merge_derived / setup_table_function / setup_materialized_derived – resolve derived tables, views, or table‑function references.
setup_natural_join_row_types – compute row types for NATURAL/USING joins.
setup_wild – expand ‘*’ to concrete column references.
setup_base_ref_items – build base_ref_items for the query block.
setup_fields – check fields and fill structures.
setup_conds – resolve WHERE and join conditions.
setup_group – set up GROUP BY list.
m_having_cond‑>fix_fields – fix HAVING clause.
resolve_rollup – process ROLLUP items.
resolve_rollup_item – replace grouped expressions with rollup items.
setup_order – set up ORDER BY clause.
resolve_limits – resolve OFFSET and LIMIT.
Window::setup_windows1 – prepare window functions.
setup_ftfuncs – set up full‑text functions.
resolve_rollup_wfs – propagate nullability for window functions in ROLLUP.
Detailed Conversion Process
The conversion starts from Query_expression calling prepare in sql/sql_resolver.cc and proceeds top‑down or bottom‑up according to transformation rules.
1. Propagate Nullability to Join Tables (propagate_nullability)
Nullable tables may contain all‑NULL rows; if a table is nullable, certain optimizations (e.g., EQ_REF access, outer‑join to inner‑join conversion) are disabled.
2. Resolve and Set Up Query‑Block Leaf Tables (setup_tables)
SELECT t1.c1 FROM t1, (SELECT t2.c1 FROM t2, (SELECT t3.c1 FROM t3 UNION SELECT t4.c1 FROM t4) AS t3a) AS t2a;Before setup_tables, each Query_block has zero leaf tables. setup_tables builds the leaf_tables list (base tables and derived tables) for later optimization. It does not recurse; derived tables are handled later by resolve_placeholder_tables → resolve_derived → derived(Query_expression)::prepare → Query_block::prepare.
3. Resolve Derived Tables, Views, and Table Functions (resolve_placeholder_tables)
If a derived table has already been merged or is forced to be materialized (e.g., CREATE ALGORITHM=TEMPTABLE VIEW), the function skips it.
4. Merge Derived Tables (merge_derived)
Prerequisites
Outer query block allows merge (allow_merge_derived).
Lexical context permits merge (lex‑>can_use_merged()).
Derived table is not forced to be materialized.
No disallowed constructs: UNION, GROUP BY with aggregation, HAVING, WINDOW, LIMIT, or empty table list.
Optimizer switch does not disable derived_merge.
Heuristic suggests merging.
Transformation Steps
Use derived_table‑>nested_join to handle outer‑join cases.
Merge underlying tables into NESTED_JOIN via merge_underlying_tables().
Move all tables from the derived table into the parent query’s table_list and delete the derived table node.
Re‑compute parent query statistics (leaf table count, derived table count, etc.).
Propagate OPTION_SCHEMA_TABLE and nullable attributes.
Merge derived‑table WHERE clause into parent ( merge_where()).
Create field translations for columns referenced from the derived table.
Remove the derived‑table structure from the parent ( exclude_level()).
Fix column references after pull‑out ( fix_tables_after_pullout()).
Remap table positions in TABLE_LIST.
If the derived table contains ORDER BY and the parent allows ordering (single derived table, not a UNION, no GROUP BY/aggregation), keep the ORDER BY; otherwise drop it.
Diagram of the merge_derived process.
5. Materialize Remaining Derived Tables (setup_materialized_derived)
Derived tables that cannot be merged are materialized. The function sets up a temporary table containing all rows of the derived table; actual materialization occurs later in the executor phase.
setup_materialized_derived_tmp_table – create temporary table for materialized rows.
check_materialized_derived_query_blocks – prepare query‑block structures for the materialized derived table.
trace_derived.add_utf8_table(this).add_utf8("function_name", func_name, func_name_len).add("materialized", true);6. Handle Table Functions (setup_table_function)
If a query block contains a table function, the resolver processes the block twice: first skipping the table‑function’s table, then applying the full resolution logic to it.
trace_derived.add_utf8_table(this).add_utf8("function_name", func_name, func_name_len).add("materialized", true);7. Expand Wildcards (setup_wild)
Expands ‘*’ in the SELECT list to concrete column references.
8. Build Base Reference Items (setup_base_ref_items)
Creates base_ref_items that record the position of every Item in the query block, enabling other items to reference them (e.g., subquery references, aggregate references, outer‑query references, null‑helper items).
9. Resolve Fields and Check Permissions (setup_fields)
Fixes fields, resolves references, and checks column privileges. Complex sub‑query cases are illustrated in the diagram.
10. Resolve WHERE and JOIN Conditions (setup_conds)
Recursively resolves join conditions, simplifies constant conditions, and replaces removable constant items with Item_func_true/false.
11. Process ROLLUP (resolve_rollup)
ROLLUP adds hierarchical aggregation levels after GROUP BY, allowing multi‑level analysis with a single query.
SELECT YEAR, country, product, SUM(profit) AS profit FROM sales GROUP BY YEAR, country, product WITH ROLLUP;12. Resolve GROUP BY / ORDER BY (setup_group / setup_order)
Functions such as find_order_in_list() map SELECT fields to ORDER BY columns, adding missing columns to the final projection and fixing fields.
13. Remove Redundant Subquery Clauses
When a subquery lacks aggregation or HAVING, unnecessary ORDER/DISTINCT/GROUP BY clauses can be stripped (e.g., converting
SELECT c1 FROM t1 WHERE t1.c2 IN (SELECT DISTINCT c1 FROM t2 GROUP BY c1, c2 ORDER BY c1)to SELECT c1 FROM t1 WHERE t1.c2 IN (SELECT c1 FROM t2)).
select c1 from t1 where t1.c2 in (select distinct c1 from t2 group by c1, c2 order by c1); select c1 from t1 where t1.c2 in (select c1 from t2);14. Window Functions (Window::setup_windows1)
Parses window definitions, resolves partition and order clauses, checks DAG relationships among windows, and determines whether a window is static or dynamic, and whether row‑ or range‑based optimization is possible.
SELECT id, release_year, rating, avg(rating) OVER (PARTITION BY release_year) AS year_avg FROM tw;Conclusion
The article covered rule‑based optimizations performed by MySQL 8.0’s optimizer, focusing on fundamental SQL elements such as tables, columns, functions, aggregates, grouping, and ordering. The next part will dive into sub‑queries, partitioned tables, and complex JOIN transformations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
