Uncovering Oracle’s Hidden Parse Failures: Why AWR Shows 99% Parse Time
A customer’s Oracle 11g RAC experienced chronic high CPU usage; deep AWR analysis revealed that massive parse time was attributed to mutex waits and a hidden hard‑parse failure caused by a missing column, which triggered Bug 16175381 and forced process spin, ultimately resolved by correcting the faulty SQL.
Fault Scenario
A client’s core CRM database (Oracle 11gR2 11.2.0.2, two‑node RAC) showed CPU utilization around 75% on one node, spiking to 100% during peak periods. Despite repeated ASH and AWR investigations, DBA could not pinpoint the root cause and resorted to frequent host reboots.
During the failure windows, the database exhibited intense CPU activity, mutex S waits on the shared pool cursor, and an average of 43 active sessions with empty SQL_ID values—an unusual situation because long‑running SQL should have a SQL_ID.
AWR Analysis During Fault
The AWR report indicated a CPU AAS of 48 (64 after hyper‑threading), confirming CPU overload. Top‑5 wait events highlighted cursor:mutex S accounting for 25% of waits, alongside DB CPU.
Parsing time dominated the time model (99.81%), yet both hard‑parse elapsed time and failed‑parse elapsed time were near zero, contradicting expectations given the heavy parsing load.
Bug Identification
Investigation of mutex information in AWR showed that the mutex sleep locations concentrated on kkshGetNextChild[KKSHBKLOC1]. A search on MOS identified Bug 16175381 – “Process spin in kkscsSearchChildList() → kkshGetNextChild()” (also related to Bug 14613900). The hypothesis was that failed parses triggered process spin.
Normal Period AWR Analysis
Comparing a normal‑operation AWR report, the top‑5 wait event shifted to SQL*Net break/reset to client , indicating normal client‑database interaction errors. Approximately 13,148 parse errors occurred within one hour (≈3.6 failures per second), ruling out manual DBA errors.
Locating the Erroneous SQL
Using event 10035, a short‑lived trace was enabled in production:
ALTER SYSTEM SET EVENTS '10035 trace name context forever, level 1'; ALTER SYSTEM SET EVENTS '10035 trace name context off';The trace revealed the offending statement:
PARSE ERROR: ospid=12597, error=904 for statement: select nvl(CantonCode,'0') as CantonCode from xxxx.yyyyy where recID = :1;Verification via sqlplus showed that table xxxx.yyyyy lacked the column CantonCode (only CANTONCODEEX existed). The mismatch stemmed from unsynchronized schema version control between the client and the developer, and fixing the SQL eliminated the issue.
Analysis Summary
The high CPU and dominant cursor:mutex S wait events were traced to the erroneous SQL that triggered Bug 16175381, causing process spin. By capturing event 10035 traces, the faulty statement was identified and corrected.
Open Questions
Why did the AWR time model report virtually 0% failed‑parse elapsed time despite numerous parse failures? Tests on Oracle 11.2.0.4 showed that when bind variables are used or cursor_sharing=force, failed‑parse time is largely omitted from the time model, making the metric misleading.
Test 1 – Non‑bind Variable SQL, cursor_sharing=exact
Executed 100,000 updates on a non‑existent table yyyy (e.g., update yyyy set n = n + 1;) and generated an AWR report. Failed‑parse elapsed time accounted for 68% of the time model, with a parse‑failure rate of 3,352 per second.
Test 2 – Non‑bind Variable SQL, cursor_sharing=force
Same workload with cursor_sharing=force. Failed‑parse elapsed time dropped to 11% of the time model (≈1/6 of Test 1), though the parse‑failure rate remained high (3,465 per second).
Test 3 – Bind Variable SQL, cursor_sharing=exact
Executed 100,000 updates using a bind variable ( update yyyy set n = n + :n;) after defining variable n number; exec :n := 1;. Failed‑parse elapsed time fell to 2.9% of the time model, with a parse‑failure rate of 3,459 per second.
Call‑Stack Differences
Comparing stack traces showed distinct call paths for failed parses with and without bind variables. The failed‑parse timing appears to be recorded in functions such as kksLoadChild, opiosq0, and kksSetBindType when bind variables are involved.
References
How to Identify Hard Parse Failures (Document ID 1353015.1)
Oracle Dynamic Performance Views – V$SESS_TIME_MODEL documentation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
