I have set up a 2 node NW 6.5 SP8 cluster to run BorderManager 3.9 SP2. I don't have a 'Split Brain Detector' (SBD) partition; the servers only monitor each other through the LAN heartbeat signal that is being sent by the master and replies by the slave. This has worked well from a high availability perspective but I keep running into a situation where both nodes will go 'active'.

Usually, I have Node 0 set as both the cluster master and the host of the NBM proxy resource. Node 1 is then in standby - ready to load the proxy service and assume the proxy IP address if node 0 dies. At some point (the time is variable in days 2 - 5 and doesn't seem to be related to network load) Node 0 will think that Node 1 has failed and will show that on the Cmon console. Shortly afterwards Node 1 will think that Node 0 has failed and bind the proxy IP and cluster master IP and load the proxy. At this time I have two servers; both with the same Cluster Master IP bound and the proxy IP bound and proxy.nlm loaded!

I can access Node 0 through rconj and it appears to be working fine. If I do a 'display secondary ipaddress' I can see it has both the proxy IP and Cluster Master IP bound to it. The same thing is the case for Node 1. I unload the proxy on Node 0 and reset the server. When it comes back up, it joins the cluster just fine and there doesn't appear to be any other problem.

Has anyone else seen this behavior? (Craig???)