Why Did Oracle 10g Hang? Uncovering Log File Sync Bottlenecks and Storage I/O Issues
An Oracle 10g database froze at 10 am due to a massive log file sync wait, traced through AWR analysis to I/O performance problems and excessive commit activity, with root‑cause verification via LGWR switch frequency, parallel‑write latency, and alert log errors, leading to practical remediation steps.
1. Abnormal Wait Analysis
The database became completely unresponsive at 10 am; AWR reports that log file sync accounts for 64.2% of wait events, classifying it as a submit‑type wait.
Log file sync occurs when a session commits and all redo generated by the transaction must be flushed from memory to the redo log file to guarantee durability.
2. Root‑Cause Investigation
The two most common causes of high log file sync wait are:
LGWR I/O performance degradation
Excessive application commits
2.1 Analyze Program Commits
Compare the ratio of user commits / (user commits + user rollbacks) to user calls. The average user calls/(commits+rollbacks) is 60.85, meaning roughly one commit occurs every 61 calls, so commit frequency is not abnormal.
2.2 Check LGWR Switch Frequency
Oracle recommends a log file switch every 15‑20 minutes (3‑4 times per hour). If the per‑hour switch count exceeds this range, the redo log size is likely insufficient.
2.3 Analyze I/O Performance
Compare average wait times of log file sync and log file parallel write. The data shows that a large portion of log file sync time is spent in parallel write, indicating the I/O subsystem is the bottleneck.
Typical parallel‑write latency should be 5‑10 ms; higher values suggest storage I/O problems.
3. Alert Log Verification
The alert.log contains errors confirming the diagnosis: during a log switch, all private strands must flush to the current log before proceeding. The error indicates that not all redo was written when the switch was attempted.
Private strands, introduced in Oracle 10gR2, handle redo allocation latches, allowing multiple latches to write redo more efficiently.
4. Recommendations
Avoid placing redo logs on legacy mechanical disks; they can cause severe log file sync waits during write peaks.
Monitor all processes that write to the same storage path and ensure the disk provides sufficient bandwidth for the required throughput.
Keep LOG_BUFFER size reasonable; an excessively large log buffer increases flush wait times.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
