Databases 11 min read

Why Does Oracle RAC Throw ‘IPC Send timeout’ and How to Fix It?

The article explains the common Oracle RAC “IPC Send timeout” error, shows real alert‑log examples, analyzes root causes such as network loss, resource exhaustion, or bugs, and provides step‑by‑step troubleshooting methods including monitoring tools and configuration checks.

ITPUB
ITPUB
ITPUB
Why Does Oracle RAC Throw ‘IPC Send timeout’ and How to Fix It?

In Oracle RAC environments, the alert log may contain the message “IPC Send timeout”, often followed by ORA‑29740 or “Waiting for clusterware split‑brain resolution”, which can cause an instance to terminate or be evicted from the cluster.

Example Alert Logs

Instance 1:

Thu Jul 02 05:24:50 2012
IPC Send timeout detected. Sender: ospid 6143755
Receiver: inst 2 binc 1323620776 ospid 49715160
Thu Jul 02 05:24:51 2012
IPC Send timeout to 1.7 inc 120 for msg type 65516 from opid 13
Communications reconfiguration: instance_number 2
Waiting for clusterware split‑brain resolution
Trace dumping is performing id=[cdmp_20120702052451]
Evicting instance 2 from cluster

Instance 2:

Thu Jul 02 05:24:50 2012
IPC Send timeout detected. Receiver ospid 49715160
Errors in file /u01/oracle/product/admin/sales/bdump/sales2_lmon_6257780.trc:
ORA‑29740: evicted by member 0, group incarnation 122
LMON: terminating instance due to error 29740
ORA‑29740: evicted by member , group incarnation

How RAC Communication Works

The main inter‑instance processes are LMON, LMD and LMS. When a message is sent, the sender expects an acknowledgment within the default 300‑second timeout; otherwise the “IPC Send timeout” error is raised.

Typical Causes

Network problems causing packet loss or abnormal communication.

Host resource issues (CPU, memory, I/O) that prevent the processes from being scheduled or responding.

Oracle bugs (relatively rare compared with the first two causes).

Monitoring Recommendations

To diagnose these issues, OS and network monitoring tools are essential. Installing OSWBB (OSWatcher Black Box) is recommended for collecting vmstat, iostat and netstat data.

Case Study 1 – Resource Exhaustion

Alert log shows the receiver on node 2 (ospid 1596935). The vmstat snapshots reveal 100 % CPU usage and a very high run‑queue, indicating the receiver could not respond.

System Configuration: lcpu=32 mem=128000MB
... 
25 1 7532667 19073986 0 0 0 0 5 0 9328 88121 20430 32 10 47 11
58 0 7541201 19065392 0 0 0 0 0 0 11307 177425 10440 87 13 0 0  <== idle CPU = 0 (CPU 100 %)
...

Case Study 2 – Network Surge

The alert log points to node 2 as the receiver. Netstat output shows a sudden increase of several hundred thousand packets within a 30‑second window, while vmstat shows no CPU or memory pressure.

Node2:
en1 10.182.3.2 Ipkts 4073847798 → 4074082951 (increase 235 153 packets/30 s)
Node1:
en1 10.182.3.1 Ipkts 502159550 → 502321317 (increase 161 767 packets/30 s)

This burst can trigger communication failures; checking the network, ensuring consistent MTU settings, and possibly restarting or swapping switches are advised.

Case Study 3 – I/O Problem

The alert log shows the receiver as LMON on node 1. The associated trace file contains an I/O‑related call stack, indicating the process was stalled on disk operations.

kjxgmpoll: stalled for 94 seconds (threshold 42 sec)
--- Call Stack Trace ---
... kfk_io1 → kfkRequest → kfk_transitIO → kfioSubmitIO → ...

Practical Troubleshooting Checklist

Use Oracle’s Cluster Health Monitor (CHM) output, if available, to review resource and network usage at the time of the error.

If CHM is not present, install OSWBB to capture OS‑level metrics.

Check for UDP/IP packet loss or other network errors.

Verify that all nodes share identical network settings, especially MTU; ensure switches support Jumbo Frames when used.

Confirm that CPU utilization is not saturated and that sufficient memory is available.

Investigate whether the instance experienced a database hang or severe performance degradation before eviction.

For additional details, see the Oracle MOS document “Top 5 issues for Instance Eviction” (Doc ID 1374110.1).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Network Monitoringdatabase troubleshootingClusterwareOracle RACIPC Send timeoutOSWatcher
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.