How to Shrink Oracle Indexes for Skewed Columns Using Function Indexes
This article explains why conventional indexes waste space and perform poorly on highly skewed columns, introduces a decode‑based function index that excludes high‑frequency values, details the experimental setup with millions of rows, compares index size and query performance, and outlines the method's limitations.
When selecting columns for indexing, high selectivity and dispersion are ideal, but many real‑world tables exhibit severe value skew where a few values dominate most rows. In Oracle, the optimizer (CBO) often bypasses indexes for high‑frequency values, and the default index stores every distinct value, leading to large index structures and unnecessary I/O.
Proposed Solution
Use a function index that maps the high‑frequency values to NULL so they are omitted from the index. The DECODE function can implement this mapping, e.g.:
DECODE(secondary, 'S', NULL, 'J', NULL, 'T', NULL, secondary)Oracle does not index NULL entries, so only low‑frequency values are indexed, resulting in a much smaller B‑tree.
Experimental Setup
A table t with about 4.8 million rows was created from dba_objects. The secondary column contains values S, T, J (over 99% of rows) and a few other values.
SQL> SELECT secondary, COUNT(*) FROM t GROUP BY secondary;
SECONDARY COUNT(*)
---------- ---------
W 273
Q 9
D 273
T 421230
J 1866592
E 99
S 2470733Index Creation
Two indexes were built on secondary:
SQL> CREATE INDEX IND_SEC_NORMAL ON t(secondary);
SQL> CREATE INDEX IND_T_FUN ON t(
DECODE(secondary,'S',NULL,'J',NULL,'T',NULL,secondary));The normal index occupied 75.5 MiB (80 extents, 9216 blocks) while the function index used only the initial allocation of 65 KiB (1 extent, 8 blocks).
Performance Comparison
Querying rows where secondary='W' using the normal index:
SQL> SELECT * FROM t WHERE secondary='W';
-- Execution time: 00:00:00.37
-- Cost: 11
-- Consistent gets: 272
-- Physical reads: 21Using the function index:
SQL> SELECT * FROM t WHERE DECODE(secondary,'S',NULL,'J',NULL,'T',NULL,secondary)='W';
-- Execution time: 00:00:00.04
-- Cost: 116
-- Consistent gets: 140
-- Physical reads: 0The function index reduced execution time by an order of magnitude and eliminated physical reads, though CPU cost increased because the decode expression must be evaluated for each row.
Statistics Gathering
SQL> EXEC dbms_stats.gather_table_stats(user, 'T', cascade=>TRUE,
estimate_percent=>100, method_opt=>'FOR ALL INDEXED COLUMNS');Conclusions and Limitations
The function index dramatically shrinks index size and speeds up queries that filter on high‑frequency values.
Benefits are noticeable only on large tables with strong value skew; small tables may not see a net gain.
Higher CPU usage is expected because the decode function is evaluated at query time.
The technique works best when the queried value appears frequently; for low‑frequency values the advantage diminishes.
Proper planning is required to ensure the table is large enough and the skew is significant before adopting this method.
Overall, applying a decode‑based function index is an effective way to handle column‑value skew in Oracle databases, reducing storage overhead and improving query performance for the most common values.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
