Databases 8 min read

Why Oracle’s Optimizer Misestimates Multi‑Column Filters and How to Fix It

This article explains how Oracle’s Cost‑Based Optimizer can produce incorrect row estimates for queries with multiple column predicates, demonstrates the problem with a test table, and shows that gathering multi‑column statistics resolves the misestimation, improving execution plans.

ITPUB
ITPUB
ITPUB
Why Oracle’s Optimizer Misestimates Multi‑Column Filters and How to Fix It

Background

When a SQL statement is submitted to an Oracle database, the Query Optimizer selects the most efficient execution plan, using the Cost‑Based Optimizer (CBO) by default. CBO estimates the cost of each plan by combining CPU and I/O consumption into a numeric cost value.

Problem Statement

For queries with multiple predicates in the WHERE clause, Oracle traditionally multiplies the selectivity of each column. This can lead to inaccurate selectivity estimates, causing the optimizer to choose a sub‑optimal plan. Oracle 11g introduced the ability to gather multi‑column statistics to address this issue.

Environment Setup

SQL> select * from v$version;

Running on Oracle Database 11g Enterprise Edition Release 11.2.0.3.0.

SQL> conn hr/hr
SQL> create table hoegh as select * from employees;
SQL> insert into hoegh select * from hoegh;
SQL> commit;
SQL> select count(*) from hoegh;
-- Result: 1712 rows

Collecting Statistics the Conventional Way

SQL> exec dbms_stats.gather_table_stats('HR','HOEGH');

Execution Plan for a Single‑Column Predicate

SQL> explain plan for select * from hoegh where employee_id = 110;
SQL> select * from table(dbms_xplan.display);
-- Plan shows 16 rows estimated, full table scan.

The optimizer calculates selectivity as 1 / distinct_values. With 107 distinct employee IDs, selectivity = 1/107, and estimated rows = 1/107 * 1712 ≈ 16.

Execution Plan for a Two‑Column Predicate (Before Multi‑Column Stats)

SQL> explain plan for select * from hoegh where employee_id = 110 and email = 'JCHEN';
SQL> select * from table(dbms_xplan.display);
-- Plan shows 1 row estimated, full table scan.

The actual query returns 16 rows, but the plan predicts only 1 because the optimizer multiplies the selectivities (1/107 * 1/107) resulting in an estimated row count less than 1, which it rounds up to 1.

Gathering Multi‑Column Statistics

SQL> exec dbms_stats.gather_table_stats('HR','HOEGH', method_opt=>'for columns(employee_id,email)');
SQL> explain plan for select * from hoegh where employee_id = 110 and email = 'JCHEN';
SQL> select * from table(dbms_xplan.display);
-- Plan now shows 16 rows estimated, full table scan.

After collecting statistics on the combination of employee_id and email, the optimizer correctly recognizes that the two columns are correlated (both uniquely identify a row) and adjusts the selectivity, producing an accurate row estimate.

Conclusion

Gathering multi‑column statistics in Oracle 11g fixes the optimizer’s misestimation for queries with correlated predicates, leading to more accurate cost calculations and better execution plans.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLOracleQuery Optimizercost‑based optimizerexecution planMulti-Column Statistics
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.