Databases 9 min read

Why Did My Oracle RAC Node Fail to Start with SQLPLUS? A Deep Dive into HAIP and Environment Variables

This article recounts a detailed troubleshooting journey of an Oracle 11gR2 RAC cluster where a newly added node could not start with SQLPLUS due to HAIP misconfiguration and mismatched environment variables, explaining the diagnostic steps, root cause discovery, and the final resolution.

dbaplus Community
dbaplus Community
dbaplus Community
Why Did My Oracle RAC Node Fail to Start with SQLPLUS? A Deep Dive into HAIP and Environment Variables

Problem Solving Process

During a data‑mart project, an existing Oracle RAC 11gR2 cluster on X86 was expanded from a 2‑node to a 4‑node configuration. After adding the second pair of nodes, the new RAC instance could not be started with SQLPLUS and required the cluster command instead, contrary to expectations that any node could start the database in any order.

Initial testing showed that starting node 1 first then node 2 succeeded, but starting node 2 first caused node 1 to fail at the NOMOUNT stage with a cluster communication error. The ALERT log indicated a loss of inter‑node communication.

Various cluster diagnostics were run, including crsctl status resource -t, crsctl check cluster -all, oifcfg, olsnodes, and srvctl. All returned normal results, providing no clues.

Further log inspection revealed that the private network IP logged in the ALERT file (169.254.183.196) differed from the configured private IP (192.168.110.12). Research showed this discrepancy stemmed from Oracle's HAIP feature introduced after version 11.2.0.2.

Additional investigation uncovered that when the problematic node was started with SQLPLUS, the cluster IP shown was the public IP (172.31.0.116) instead of the private IP, suggesting HAIP was incorrectly using the public network.

Comparing the two nodes' ALERT logs showed node 1 correctly reported both private and public IPs, while node 2 reported a GPNP error and could not obtain the proper private IP. Tracing the startup with TRACE files revealed a missing file in the GRID installation path.

Inspection of the operating system (SUSE Linux 11 SP2) uncovered differing environment variables for the Oracle user. Node 2 had extra definitions for ORA_CRS_HOME and ORA_ASM_HOME in /etc/profile.d/oracle.sh. Renaming this file removed the extraneous variables, and after re‑logging, the node started successfully with the correct HAIP (169.* network) using SQLPLUS. The other nodes were verified to have the same corrected environment.

Reflection Summary

Only checking the problematic node's ALERT log missed clues present in the healthy node's logs.

Lack of familiarity with SUSE‑specific Oracle environment files ( /etc/profile.d/oracle.sh) led to overlooking the root cause.

Even thorough OS configuration checks can miss hidden environment settings; reviewing all node configurations is essential.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxtroubleshootingOracleEnvironment VariablesRACHAIP
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.