Cutting Oracle RAC CPU Usage from 100% to 12 ms: A Step‑by‑Step SQL Tuning Case Study
During a morning peak on December 15, 2020, a customer's Oracle RAC cluster hit 100% CPU on both nodes, causing severe response delays, and after detailed session analysis and SQL rewrite the CPU load dropped dramatically, restoring normal service performance.
Problem Overview
On the morning of 2020‑12‑15 the zCloud monitoring system reported that both RAC nodes of an Oracle database reached 100 % CPU, causing a large backlog of sessions and slow application response.
Root Cause Analysis
Identify the OS user responsible for high CPU Running top showed the CPU consumption was dominated by user time. The top processes belonged to the oracle OS user, confirming that the database front‑end was the source.
Locate the offending SQL in the database Using the zCloud "Top 5 Wait Event" view, most sessions were waiting on ON CPU and latch:cache buffers chains . Filtering by this wait event isolated a single SQL ID cuk5cguanbaqt , which was executed repeatedly and caused latch contention. The SQL averaged more than two minutes per execution and ran 611 times within one hour, generating a flood of sessions waiting on the latch and driving CPU to its limit.
SQL Performance Bottleneck
Executing the SQL (with bind variables replaced) in sqlplus and examining the SQL Monitor revealed that the bottleneck was a MERGE JOIN. Both branches of the join performed about nine million row accesses, indicating excessive index and table scans.
Optimization Techniques
Reduce repeated index and table accesses Extract the two sub‑queries of the MERGE‑JOIN into a WITH clause. This forces each index and table to be accessed only once and clarifies the logical flow.
Eliminate unnecessary joins The original LEFT JOIN on order_id was redundant because order_id is the primary key of the two joined tables. Removing the join reduced one table association.
Filter the large table C early Table C returned millions of rows without any predicate. Adding an EXISTS condition that links C to O allows early filtering before the join with ORF , dramatically shrinking the intermediate result set. Statistics showed that C produced over 8.4 million rows (close to the full 9 million), while O after filtering produced fewer than 2 000 rows. The rewritten query therefore processes far fewer rows.
Optimized SQL
The final rewritten statement is shown in the image below:
Results
Before optimization the SQL took about 154 seconds per execution (average reported by zCloud ≈ 2 minutes). After the rewrite the same statement finishes in roughly 12 ms.
Logical reads dropped from over 110 million per execution to about 10 thousand, yielding a performance gain of more than 10 000×.
Original article: https://www.modb.pro/db/43085
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
