Databases 10 min read

Uncovering Oracle’s Hidden Parse Failures: Why AWR Shows 99% Parse Time

A customer’s Oracle 11g RAC experienced chronic high CPU usage; deep AWR analysis revealed that massive parse time was attributed to mutex waits and a hidden hard‑parse failure caused by a missing column, which triggered Bug 16175381 and forced process spin, ultimately resolved by correcting the faulty SQL.

dbaplus Community
dbaplus Community
dbaplus Community
Uncovering Oracle’s Hidden Parse Failures: Why AWR Shows 99% Parse Time

Fault Scenario

A client’s core CRM database (Oracle 11gR2 11.2.0.2, two‑node RAC) showed CPU utilization around 75% on one node, spiking to 100% during peak periods. Despite repeated ASH and AWR investigations, DBA could not pinpoint the root cause and resorted to frequent host reboots.

During the failure windows, the database exhibited intense CPU activity, mutex S waits on the shared pool cursor, and an average of 43 active sessions with empty SQL_ID values—an unusual situation because long‑running SQL should have a SQL_ID.

AWR Analysis During Fault

The AWR report indicated a CPU AAS of 48 (64 after hyper‑threading), confirming CPU overload. Top‑5 wait events highlighted cursor:mutex S accounting for 25% of waits, alongside DB CPU.

Parsing time dominated the time model (99.81%), yet both hard‑parse elapsed time and failed‑parse elapsed time were near zero, contradicting expectations given the heavy parsing load.

Bug Identification

Investigation of mutex information in AWR showed that the mutex sleep locations concentrated on kkshGetNextChild[KKSHBKLOC1]. A search on MOS identified Bug 16175381 – “Process spin in kkscsSearchChildList() → kkshGetNextChild()” (also related to Bug 14613900). The hypothesis was that failed parses triggered process spin.

Normal Period AWR Analysis

Comparing a normal‑operation AWR report, the top‑5 wait event shifted to SQL*Net break/reset to client , indicating normal client‑database interaction errors. Approximately 13,148 parse errors occurred within one hour (≈3.6 failures per second), ruling out manual DBA errors.

Locating the Erroneous SQL

Using event 10035, a short‑lived trace was enabled in production:

ALTER SYSTEM SET EVENTS '10035 trace name context forever, level 1';
ALTER SYSTEM SET EVENTS '10035 trace name context off';

The trace revealed the offending statement:

PARSE ERROR: ospid=12597, error=904 for statement:
select nvl(CantonCode,'0') as CantonCode from xxxx.yyyyy where recID = :1;

Verification via sqlplus showed that table xxxx.yyyyy lacked the column CantonCode (only CANTONCODEEX existed). The mismatch stemmed from unsynchronized schema version control between the client and the developer, and fixing the SQL eliminated the issue.

Analysis Summary

The high CPU and dominant cursor:mutex S wait events were traced to the erroneous SQL that triggered Bug 16175381, causing process spin. By capturing event 10035 traces, the faulty statement was identified and corrected.

Open Questions

Why did the AWR time model report virtually 0% failed‑parse elapsed time despite numerous parse failures? Tests on Oracle 11.2.0.4 showed that when bind variables are used or cursor_sharing=force, failed‑parse time is largely omitted from the time model, making the metric misleading.

Test 1 – Non‑bind Variable SQL, cursor_sharing=exact

Executed 100,000 updates on a non‑existent table yyyy (e.g., update yyyy set n = n + 1;) and generated an AWR report. Failed‑parse elapsed time accounted for 68% of the time model, with a parse‑failure rate of 3,352 per second.

Test 2 – Non‑bind Variable SQL, cursor_sharing=force

Same workload with cursor_sharing=force. Failed‑parse elapsed time dropped to 11% of the time model (≈1/6 of Test 1), though the parse‑failure rate remained high (3,465 per second).

Test 3 – Bind Variable SQL, cursor_sharing=exact

Executed 100,000 updates using a bind variable ( update yyyy set n = n + :n;) after defining variable n number; exec :n := 1;. Failed‑parse elapsed time fell to 2.9% of the time model, with a parse‑failure rate of 3,459 per second.

Call‑Stack Differences

Comparing stack traces showed distinct call paths for failed parses with and without bind variables. The failed‑parse timing appears to be recorded in functions such as kksLoadChild, opiosq0, and kksSetBindType when bind variables are involved.

References

How to Identify Hard Parse Failures (Document ID 1353015.1)

Oracle Dynamic Performance Views – V$SESS_TIME_MODEL documentation

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancemutexCPUbugOracleAWRParse
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.