I thought I had this working on 2 SLES10 boxes with OES2, but now each node thinks the other is dead. Therefore both allocate the resources of a shared IP and both start rcnovell-ipsmd and rcnovell-idsd.

I came back from vacation to see such behavior.

Logs from my test environment:
NODEA (prt1):
heartbeat[12293]: 2010/02/17_10:27:47 info: Link prt1:eth0 up.
heartbeat[12293]: 2010/02/17_10:29:46 WARN: node prt2: is dead
heartbeat[12293]: 2010/02/17_10:29:46 info: Comm_now_up(): updating status to active
heartbeat[12293]: 2010/02/17_10:29:46 info: Local status now set to: 'active'
heartbeat[12293]: 2010/02/17_10:29:46 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90)
heartbeat[12293]: 2010/02/17_10:29:46 WARN: No STONITH device configured.
heartbeat[12293]: 2010/02/17_10:29:46 WARN: Shared disks are not protected.
heartbeat[12293]: 2010/02/17_10:29:46 info: Resources being acquired from prt2.
heartbeat[13278]: 2010/02/17_10:29:46 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90 gid 90 (pid 13278)
harc[13279]: 2010/02/17_10:29:46 info: Running /etc/ha.d/rc.d/status status
mach_down[13308]: 2010/02/17_10:29:46 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[13308]: 2010/02/17_10:29:46 info: mach_down takeover complete for node prt2.
heartbeat[12293]: 2010/02/17_10:29:46 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat[12293]: 2010/02/17_10:29:46 info: mach_down takeover complete.
IPaddr[13352]: 2010/02/17_10:29:46 INFO: Resource is stopped
heartbeat[13280]: 2010/02/17_10:29:46 info: Local Resource acquisition completed.
harc[13404]: 2010/02/17_10:29:46 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[13404]: 2010/02/17_10:29:46 received ip-request-resp 10.4.55.50 OK yes
ResourceManager[13425]: 2010/02/17_10:29:46 info: Acquiring resource group: prt1 10.4.55.50 novell-idsd novell-ipsmd
IPaddr[13452]: 2010/02/17_10:29:46 INFO: Resource is stopped
ResourceManager[13425]: 2010/02/17_10:29:46 info: Running /etc/ha.d/resource.d/IPaddr 10.4.55.50 start
IPaddr[13528]: 2010/02/17_10:29:47 INFO: Using calculated nic for 10.4.55.50: eth0
IPaddr[13528]: 2010/02/17_10:29:47 INFO: Using calculated netmask for 10.4.55.50: 255.255.0.0
IPaddr[13528]: 2010/02/17_10:29:47 INFO: eval ifconfig eth0:0 10.4.55.50 netmask 255.255.0.0 broadcast 10.4.255.255
IPaddr[13511]: 2010/02/17_10:29:47 INFO: Success
ResourceManager[13425]: 2010/02/17_10:29:47 info: Running /etc/init.d/novell-idsd start
ResourceManager[13425]: 2010/02/17_10:29:47 info: Running /etc/init.d/novell-ipsmd start
heartbeat[12293]: 2010/02/17_10:29:56 info: Local Resource acquisition completed. (none)
heartbeat[12293]: 2010/02/17_10:29:56 info: local resource transition completed.


NODE B(prt2):
Feb 17 10:26:24 prt2 heartbeat: [7642]: info: Link prt2:eth0 up.
Feb 17 10:28:23 prt2 heartbeat: [7642]: WARN: node prt1: is dead
Feb 17 10:28:23 prt2 heartbeat: [7642]: info: Comm_now_up(): updating status to active
Feb 17 10:28:23 prt2 heartbeat: [7642]: info: Local status now set to: 'active'
Feb 17 10:28:23 prt2 heartbeat: [7642]: info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90)
Feb 17 10:28:23 prt2 heartbeat: [7642]: WARN: No STONITH device configured.
Feb 17 10:28:23 prt2 heartbeat: [7642]: WARN: Shared disks are not protected.
Feb 17 10:28:23 prt2 heartbeat: [7642]: info: Resources being acquired from prt1.
Feb 17 10:28:23 prt2 heartbeat: [8580]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 90 gid 90 (pid 8580)
harc[8581]: 2010/02/17_10:28:23 info: Running /etc/ha.d/rc.d/status status
Feb 17 10:28:24 prt2 heartbeat: [8582]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys prt2] to acquire.
Feb 17 10:28:24 prt2 heartbeat: [7642]: info: Initial resource acquisition complete (T_RESOURCES(us))
mach_down[8607]: 2010/02/17_10:28:24 info: Taking over resource group 10.4.55.50
ResourceManager[8646]: 2010/02/17_10:28:24 info: Acquiring resource group: prt1 10.4.55.50 novell-idsd novell-ipsmd
IPaddr[8675]: 2010/02/17_10:28:24 INFO: Resource is stopped
ResourceManager[8646]: 2010/02/17_10:28:24 info: Running /etc/ha.d/resource.d/IPaddr 10.4.55.50 start
IPaddr[8760]: 2010/02/17_10:28:24 INFO: Using calculated nic for 10.4.55.50: eth0
IPaddr[8760]: 2010/02/17_10:28:24 INFO: Using calculated netmask for 10.4.55.50: 255.255.0.0
IPaddr[8760]: 2010/02/17_10:28:24 INFO: eval ifconfig eth0:0 10.4.55.50 netmask 255.255.0.0 broadcast 10.4.255.255
IPaddr[8739]: 2010/02/17_10:28:24 INFO: Success
ResourceManager[8646]: 2010/02/17_10:28:24 info: Running /etc/init.d/novell-idsd start
ResourceManager[8646]: 2010/02/17_10:28:25 info: Running /etc/init.d/novell-ipsmd start
mach_down[8607]: 2010/02/17_10:28:25 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[8607]: 2010/02/17_10:28:25 info: mach_down takeover complete for node prt1.
Feb 17 10:28:25 prt2 heartbeat: [7642]: info: mach_down takeover complete.
Feb 17 10:28:34 prt2 heartbeat: [7642]: info: Local Resource acquisition completed. (none)
Feb 17 10:28:34 prt2 heartbeat: [7642]: info: local resource transition completed.