Databases 12 min read

SQL Query Analysis and Access Path Strategies

This article explains how relational optimizers validate, analyze, and transform SQL statements into executable code, detailing the four-step optimization process, query‑analysis factors, cost‑based access‑path selection, table scans, index lookups, and the role of data prefetching in improving performance.

360 Tech Engineering

Jan 7, 2019

SQL Query Analysis and Access Path Strategies

In the first part of the SQL performance overview we examined relational optimization; this article shifts focus to query analysis and how SQL is converted into executable code, aiming to help readers improve SQL performance.

The optimization process consists of four steps:

Receive and validate the SQL statement.

Analyze the environment and determine optimization methods for the SQL.

Create machine‑readable instructions to execute the optimized SQL.

Execute those instructions or store them for later execution.

The first step checks that the SQL syntax is correct; if errors are found the process stops until the statement is fixed. After syntax validation, semantic checks such as data types, referential constraints, check constraints, views, and triggers are performed.

The second step, query analysis, evaluates the overall complexity of the SQL. Factors such as the number and type of predicates, presence of functions, sorting clauses, and estimated cost (CPU, I/O, statistics) are considered. The optimizer scans the statement to determine which tables are needed, whether views must be expanded, whether joins or sub‑queries are required, whether UNION/EXCEPT/INTERSECT are used, which indexes can be applied, how many predicates must be satisfied, which functions must be executed, and whether logical operators AND/OR are present. It also assesses how the DBMS will handle each component and the memory allocated for caching table data or sorting.

In other words, query analysis breaks the SQL into discrete tasks that must be performed to produce the result set.

Modern cost‑based optimizers aim to choose the lowest‑cost access path by applying cost formulas that weigh CPU usage, I/O operations, catalog statistics, and the actual SQL code. The optimizer may rewrite the query, push down predicates, and evaluate multiple access paths before selecting the cheapest one, finally generating executable code.

Access Path

Relational optimizers have many options for creating SQL access paths. At a high level they can access data from a single table or combine data from multiple tables, forming a sequence of methods that constitute the overall access path.

For single‑table access, a scan or an index lookup can be used. After determining which indexes are available for each predicate, the optimizer decides whether to use a single index, multiple indexes, or none.

Although index access often seems superior to a full table scan, the optimizer must consider the amount of data to be read and the nature of the query. For example, generating a report that reads every row may be slower with an index than with a scan.

Table scans read every row of a table; some DBMSs also support tablespace scans that read every page of a tablespace, which can be slower due to extra I/O. Partition scans limit the scan to relevant partitions, reducing I/O compared to full scans.

Scans are chosen when:

No suitable index exists or predicates do not match an index.

A high percentage of rows satisfy the predicates, making index use inefficient.

Available indexes have a high clustering factor and benefit only a small subset of rows.

The table is small, where index overhead outweighs benefits.

To improve scan performance, optimizers may invoke data prefetching, which reads data pages into cache before they are requested, reducing I/O latency for sequential accesses.

Index Access

Most accesses should use an index when possible. The optimizer first discovers whether an index exists; if not, it falls back to a scan. Indexable predicates must reference at least one indexed column, and the set of indexable predicates varies by DBMS and version.

Index access methods include direct index lookups, where the values for all indexed columns are provided, allowing the DBMS to traverse the index tree directly to the leaf pages. If not all key columns are supplied, the optimizer may perform an index scan.

Index scans come in two forms:

Matched (or absolute) index scan: starts at the index root and follows the tree using the highest‑order (first) column of the index key; because the full key is not available, the DBMS scans index pages to find matching values.

Unmatched (or relative) index scan: used when the first key column is not specified; the DBMS scans index pages sequentially from the first page, applying predicates to subsequent pages.

Unmatched scans can be more efficient than table scans because index pages contain more entries per page, reducing I/O.

Summary

In this article we examined query analysis and access‑path formulas, learning about the components of query analysis and single‑table access methods. Future articles will explore multi‑table access strategies used by relational optimizers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Query Optimization Database Performance Access Path cost-based optimizer Index Scan

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.