Why Did Our Oracle RAC Cluster Stall? A Real‑World AWR Diagnosis
A client reported sudden Oracle database slowdown, prompting a post‑mortem analysis using AWR and TFA data that revealed GC bottlenecks, RAC heartbeat packet loss, and an intermittent storage link failure, ultimately resolved by disabling the faulty port and restarting the affected node.
Last Tuesday morning a client called reporting severe Oracle database performance degradation; because remote operations were not logged, the engineer relied on post‑mortem AWR and TFA data to investigate.
The AWR top‑event report for node 2 between 08:00 and 08:40 showed obvious GC problems, including a GC CR failure and extremely long gc cr block flush and current block flush times, indicating disk‑write latency.
RAC metrics revealed heartbeat traffic near 50 MB/s, with GC CR block flush and current block flush remaining very slow while log flushes appeared normal. An ifconfig check showed RX missed packets, suggesting heartbeat packet loss.
From around 08:08 the database began exhibiting abnormal behavior, and node 1 rebooted at 08:21. To restore service quickly during the morning peak, the engineer advised stopping node 1 and running the workload solely on node 2; after shutting down the node 1 instance, the business recovered.
Further analysis compared the RAC data with the previous day's peak period, revealing that the current heartbeat volume was significantly lower. Storage logs started showing errors around 07:40, and history events confirmed massive GCS log‑flush sync waits.
The root cause was identified as an intermittently failing storage‑link port; disabling this faulty port allowed node 1 to start and the entire cluster to return to normal operation.
Key takeaways: slow GCS log flushes or packet loss do not automatically imply a heartbeat issue; storage‑link problems can cause DBWR write‑to‑disk failures, which in turn block LGWR. High GC CR block flush times are a clear sign of underlying disk‑write anomalies.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
