What Happens When a Single Bad SQL Slows Down an Entire Data Warehouse? Real Cases & Fixes
This article presents four real‑world Oracle SQL cases—from a Cartesian join that halted nightly reports to a varchar primary key causing full scans, non‑standard date syntax breaking partition pruning, and missing statistics leading to unstable plans—detailing analysis, execution‑plan insights, and concrete remediation steps.
Case 1 – Cartesian Join Causing Massive Runtime
Background : In a large e‑commerce data warehouse nightly jobs generate business reports. At 6 am a monitoring alert reported that reports expected to finish by 8 am were still running at 10 am. The DBA traced the slowdown to a newly added SQL that was being executed repeatedly.
Problem
The query joined two large tables without any join condition, producing a Cartesian product. The optimizer estimated 3505 TB of rows, 127 PB of data, a cost of 9890 G and a runtime of 999:59:59.
SELECT /*+ INDEX (A1 xxxxx) */ SUM(A2.CRKSL), SUM(A2.CRKSL*A2.DJ)
FROM xxxx A2, xxxx A1
WHERE A2.CRKFLAG = xxx
AND A2.CDATE >= xxx
AND A2.CDATE < xxx;Resolution
Rewrite the statement with an explicit join predicate (e.g., JOIN ... ON A2.id = A1.id) or add the missing condition. After the fix the optimizer chose an indexed access path and the nightly jobs completed within the expected window.
Case 2 – VARCHAR2 Primary Key Prevents Index Range Scan
Background : An ERP system with a decade‑old schema stored massive data. A cleanup job that should delete < 10 % of rows based on the primary key used a VARCHAR2 primary key, causing the optimizer to perform full table scans.
Experiment
Two tables were created, each with 3.2 million rows:
CREATE TABLE t1 AS SELECT * FROM dba_objects WHERE 1=0;
ALTER TABLE t1 ADD id INT PRIMARY KEY;
CREATE TABLE t2 AS SELECT * FROM dba_objects WHERE 1=0;
ALTER TABLE t2 ADD id VARCHAR2(10) PRIMARY KEY;
-- Insert 3.2 M rows into each table (omitted for brevity)Range query on the numeric key: SELECT * FROM t1 WHERE id >= 3199990; Result: 11 rows returned, plan shows INDEX RANGE SCAN.
Range query on the character key: SELECT * FROM t2 WHERE id >= '3199990'; Result: 755 565 rows returned, plan shows FULL TABLE SCAN. The character values are ordered lexicographically, not numerically, inflating the result set and causing a huge clustering factor.
Analysis
SELECT table_name, index_name, leaf_blocks, num_rows, clustering_factor
FROM user_indexes
WHERE table_name IN ('T1','T2');
TABLE_NAME INDEX_NAME LEAF_BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- ------------ ------------ --------- -----------------
T1 SYS_C0025294 6275 3200000 31520
T2 SYS_C0025295 13271 3200000 632615Solution
Use a numeric primary key, or rewrite the predicate to a closed range so that the index can be used efficiently:
SELECT * FROM t2 WHERE id BETWEEN '3199990' AND '3200000';Plan now shows INDEX RANGE SCAN with only six rows processed.
Case 3 – Non‑Standard Date Literals Prevent Partition Pruning
Background : In the same data warehouse a newly added SQL used to_date(20120208,'yyyy-mm-dd') (numeric literal without quotes) and a UNION ALL to split logic. The optimizer could not recognise the date literals, so it scanned all partitions, leading to an eight‑hour job.
Original Query
SELECT ...
FROM ...
WHERE (order_creation_date >= to_date(20120208,'yyyy-mm-dd')
AND order_creation_date < to_date(20120209,'yyyy-mm-dd'))
OR (send_date >= to_date(20120208,'yyyy-mm-dd')
AND send_date < to_date(20120209,'yyyy-mm-dd'))
AND NVL(a.bd_id,0) = 1;The execution plan showed a full scan of all partitions (Pstart=1, Pstop=92).
Fix
Quote the date literals and use a standard format:
SELECT ...
FROM ...
WHERE (order_creation_date >= TO_DATE('2012-02-08','YYYY-MM-DD')
AND order_creation_date < TO_DATE('2012-02-09','YYYY-MM-DD'))
OR (send_date >= TO_DATE('2012-02-08','YYYY-MM-DD')
AND send_date < TO_DATE('2012-02-09','YYYY-MM-DD'))
AND NVL(a.bd_id,0) = 1;After the change the optimizer pruned to a single partition (Pstart=57, Pstop=57) and the runtime dropped from >8 h to ~8 min.
Case 4 – Missing Statistics on Partitioned Index Cause Unstable Plan
Background : At month‑end a batch of reports slowed dramatically. An investigation revealed that a key SQL switched between a cheap nested‑loop join and an expensive hash join because statistics for a partitioned index were missing.
Analysis
The optimizer chose a nested‑loop join with cost 0 due to absent statistics, which performed poorly on large partitions.
Resolution
Gather accurate statistics for the affected index (or its partition) using DBMS_STATS:
EXEC dbms_stats.gather_index_stats(
ownname => 'XXX',
indname => 'XXX',
partname => 'PART_XXX',
estimate_percent => 10);After refreshing the statistics the optimizer selected the intended hash join and performance returned to normal.
Key Takeaways
A Cartesian join (missing join predicate) can explode row estimates and overload the system; always validate join conditions.
Choosing the correct data type for primary keys is critical. Numeric keys enable efficient range scans; character keys may cause lexical ordering issues and large clustering factors.
Use standard, quoted date literals in Oracle. Improper literals prevent partition pruning and can lead to full‑partition scans.
Up‑to‑date statistics are essential for the optimizer, especially for partitioned tables and indexes. Regularly gather statistics before high‑load windows such as month‑end.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
