Why Oracle’s Optimizer Misestimates Multi‑Column Filters and How to Fix It
This article explains how Oracle’s Cost‑Based Optimizer can produce incorrect row estimates for queries with multiple column predicates, demonstrates the problem with a test table, and shows that gathering multi‑column statistics resolves the misestimation, improving execution plans.
Background
When a SQL statement is submitted to an Oracle database, the Query Optimizer selects the most efficient execution plan, using the Cost‑Based Optimizer (CBO) by default. CBO estimates the cost of each plan by combining CPU and I/O consumption into a numeric cost value.
Problem Statement
For queries with multiple predicates in the WHERE clause, Oracle traditionally multiplies the selectivity of each column. This can lead to inaccurate selectivity estimates, causing the optimizer to choose a sub‑optimal plan. Oracle 11g introduced the ability to gather multi‑column statistics to address this issue.
Environment Setup
SQL> select * from v$version;Running on Oracle Database 11g Enterprise Edition Release 11.2.0.3.0.
SQL> conn hr/hr
SQL> create table hoegh as select * from employees;
SQL> insert into hoegh select * from hoegh;
SQL> commit;
SQL> select count(*) from hoegh;
-- Result: 1712 rowsCollecting Statistics the Conventional Way
SQL> exec dbms_stats.gather_table_stats('HR','HOEGH');Execution Plan for a Single‑Column Predicate
SQL> explain plan for select * from hoegh where employee_id = 110;
SQL> select * from table(dbms_xplan.display);
-- Plan shows 16 rows estimated, full table scan.The optimizer calculates selectivity as 1 / distinct_values. With 107 distinct employee IDs, selectivity = 1/107, and estimated rows = 1/107 * 1712 ≈ 16.
Execution Plan for a Two‑Column Predicate (Before Multi‑Column Stats)
SQL> explain plan for select * from hoegh where employee_id = 110 and email = 'JCHEN';
SQL> select * from table(dbms_xplan.display);
-- Plan shows 1 row estimated, full table scan.The actual query returns 16 rows, but the plan predicts only 1 because the optimizer multiplies the selectivities (1/107 * 1/107) resulting in an estimated row count less than 1, which it rounds up to 1.
Gathering Multi‑Column Statistics
SQL> exec dbms_stats.gather_table_stats('HR','HOEGH', method_opt=>'for columns(employee_id,email)');
SQL> explain plan for select * from hoegh where employee_id = 110 and email = 'JCHEN';
SQL> select * from table(dbms_xplan.display);
-- Plan now shows 16 rows estimated, full table scan.After collecting statistics on the combination of employee_id and email, the optimizer correctly recognizes that the two columns are correlated (both uniquely identify a row) and adjusts the selectivity, producing an accurate row estimate.
Conclusion
Gathering multi‑column statistics in Oracle 11g fixes the optimizer’s misestimation for queries with correlated predicates, leading to more accurate cost calculations and better execution plans.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
