Why Does Oracle RAC Throw ‘IPC Send timeout’ and How to Fix It?
The article explains the common Oracle RAC “IPC Send timeout” error, shows real alert‑log examples, analyzes root causes such as network loss, resource exhaustion, or bugs, and provides step‑by‑step troubleshooting methods including monitoring tools and configuration checks.
In Oracle RAC environments, the alert log may contain the message “IPC Send timeout”, often followed by ORA‑29740 or “Waiting for clusterware split‑brain resolution”, which can cause an instance to terminate or be evicted from the cluster.
Example Alert Logs
Instance 1:
Thu Jul 02 05:24:50 2012
IPC Send timeout detected. Sender: ospid 6143755
Receiver: inst 2 binc 1323620776 ospid 49715160
Thu Jul 02 05:24:51 2012
IPC Send timeout to 1.7 inc 120 for msg type 65516 from opid 13
Communications reconfiguration: instance_number 2
Waiting for clusterware split‑brain resolution
Trace dumping is performing id=[cdmp_20120702052451]
Evicting instance 2 from clusterInstance 2:
Thu Jul 02 05:24:50 2012
IPC Send timeout detected. Receiver ospid 49715160
Errors in file /u01/oracle/product/admin/sales/bdump/sales2_lmon_6257780.trc:
ORA‑29740: evicted by member 0, group incarnation 122
LMON: terminating instance due to error 29740
ORA‑29740: evicted by member , group incarnationHow RAC Communication Works
The main inter‑instance processes are LMON, LMD and LMS. When a message is sent, the sender expects an acknowledgment within the default 300‑second timeout; otherwise the “IPC Send timeout” error is raised.
Typical Causes
Network problems causing packet loss or abnormal communication.
Host resource issues (CPU, memory, I/O) that prevent the processes from being scheduled or responding.
Oracle bugs (relatively rare compared with the first two causes).
Monitoring Recommendations
To diagnose these issues, OS and network monitoring tools are essential. Installing OSWBB (OSWatcher Black Box) is recommended for collecting vmstat, iostat and netstat data.
Case Study 1 – Resource Exhaustion
Alert log shows the receiver on node 2 (ospid 1596935). The vmstat snapshots reveal 100 % CPU usage and a very high run‑queue, indicating the receiver could not respond.
System Configuration: lcpu=32 mem=128000MB
...
25 1 7532667 19073986 0 0 0 0 5 0 9328 88121 20430 32 10 47 11
58 0 7541201 19065392 0 0 0 0 0 0 11307 177425 10440 87 13 0 0 <== idle CPU = 0 (CPU 100 %)
...Case Study 2 – Network Surge
The alert log points to node 2 as the receiver. Netstat output shows a sudden increase of several hundred thousand packets within a 30‑second window, while vmstat shows no CPU or memory pressure.
Node2:
en1 10.182.3.2 Ipkts 4073847798 → 4074082951 (increase 235 153 packets/30 s)
Node1:
en1 10.182.3.1 Ipkts 502159550 → 502321317 (increase 161 767 packets/30 s)This burst can trigger communication failures; checking the network, ensuring consistent MTU settings, and possibly restarting or swapping switches are advised.
Case Study 3 – I/O Problem
The alert log shows the receiver as LMON on node 1. The associated trace file contains an I/O‑related call stack, indicating the process was stalled on disk operations.
kjxgmpoll: stalled for 94 seconds (threshold 42 sec)
--- Call Stack Trace ---
... kfk_io1 → kfkRequest → kfk_transitIO → kfioSubmitIO → ...Practical Troubleshooting Checklist
Use Oracle’s Cluster Health Monitor (CHM) output, if available, to review resource and network usage at the time of the error.
If CHM is not present, install OSWBB to capture OS‑level metrics.
Check for UDP/IP packet loss or other network errors.
Verify that all nodes share identical network settings, especially MTU; ensure switches support Jumbo Frames when used.
Confirm that CPU utilization is not saturated and that sufficient memory is available.
Investigate whether the instance experienced a database hang or severe performance degradation before eviction.
For additional details, see the Oracle MOS document “Top 5 issues for Instance Eviction” (Doc ID 1374110.1).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
