Databases 19 min read

Demystifying DB2 Optimizer: How Cost Models Shape Query Performance

This article explains the inner workings of the DB2 optimizer, its four-step processing flow, cost‑based decision making, and detailed examples comparing full‑table and index scans, followed by practical tuning tips and a Q&A session for real‑world query optimization.

dbaplus Community
dbaplus Community
dbaplus Community
Demystifying DB2 Optimizer: How Cost Models Shape Query Performance

Optimizer Overview

The DB2 optimizer is an expert system that generates access plans independent of physical storage. It decides whether to use table scans, index scans, or various join methods based on statistics and cost estimates.

DB2 Optimizer
DB2 Optimizer

Four‑Step Processing Flow

Parse the received SQL and validate its syntax and semantics.

Analyze the environment, generate the optimal execution plan, and possibly rewrite the original SQL.

Create machine‑readable instructions for the optimized SQL.

Store the plan for later execution.

Cost Model Basics

DB2 uses a Cost‑Based Optimizer (CBO). For each candidate plan it evaluates CPU cost, I/O cost, catalog statistics, and the SQL itself, selecting the plan with the lowest total cost.

Cost Formula Example
Cost Formula Example

Example: Full‑Table Scan vs Index Scan

Table T1 contains 100,000 rows occupying 5,000 pages. Statistics: CARD=100,000, NPAGES=5,000, C1 COLCARD=100, C2 COLCARD=1,000.

Full‑Table Scan Cost

I/O: 5,000 ms (1 ms per page).

CPU page cost: 500 ms (0.1 ms per page).

CPU scan cost: 2,000 ms (0.01 ms per predicate per row).

Full Table Scan Cost
Full Table Scan Cost

Index Scan Cost (index IX1 on C1)

I/O for index nodes: 3.22 ms.

I/O for data pages (50 pages for 1,000 qualifying rows): 65 ms.

Total index scan cost: 68.22 ms.

Index Scan Cost
Index Scan Cost

The index scan is dramatically faster (≈68 ms) than the full‑table scan (≈7,500 ms), illustrating the benefit of appropriate indexes.

Practical Tuning Guidelines

Prefer index scans when table pages are large, index pages are small, and predicate selectivity is high.

When several indexes exist, compare the selectivity of the leading key; equality predicates on highly selective columns favor index use.

Avoid unnecessary sorting by ensuring required ordering columns are part of the index or by pushing DISTINCT down to individual tables.

For nested‑loop joins, keep the outer table result set small and ensure the inner table has an efficient access path; maintain up‑to‑date statistics with RUNSTATS and REORG.

Technical Q&A

Q1: In a nested‑loop join, should the larger table be placed first when only qualified rows are considered?

A: Both orders are possible; DB2’s cost model decides based on the access methods available for the inner table.

Q2: Is there a third‑party tool for tracking the most resource‑intensive SQL statements?

A: IBM Data Server Manager provides monitoring and alerting, with a free edition available.

Q3: Are the assumed costs (e.g., 0.01 ms CPU, 0.1 ms I/O) configurable inside DB2?

A: The values shown are illustrative; actual costs depend on hardware and DB2 version, and DB2 uses internal estimates rather than fixed parameters.

Q4: How does frequent data change affect execution plans, and how can the optimal plan be maintained?

A: Mark tables as VOLATILE to encourage index scans; regularly run RUNSTATS and REORG to keep statistics accurate.

Q5: How to choose a reasonable join order when many tables are involved?

A: Analyze the current plan, verify statistics, and consider creating or adjusting indexes to influence the optimizer’s choice.

Q6: How does the optimizer handle an index on a column with only two distinct values?

A: Such indexes are generally not recommended unless the column is highly selective due to data skew.

Q7: What are best practices for creating indexes in DB2?

A: Keep indexes few and purposeful; use EXPLAIN MODE to obtain index recommendations, and evaluate the global impact rather than per‑query suggestions.

Q8: When should a joint distribution key be used in DPF databases, and is its hash based on multiple columns?

A: Use a joint key when queries frequently filter on multiple columns together; the hash is derived from the combined columns.

Q9: How to handle cases where predicate selectivity cannot be estimated for multi‑table or temporary table joins?

A: Consult the Statistics view for more accurate estimates; see IBM developerWorks articles for details.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

query optimizationDatabase PerformanceCost ModelIndex ScanDB2
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.