Hi folks,
initially I apologize for my English.

We are running NW6.5 SP5a in a 2-node-cluster and we use McAfee’s Netshield. Its version is 4.63 and it is running the scan engine 5.2.00 and definitions 4.0.xxxx. One cluster node holds all resources (FTP, master IP, and four volumes with nearly 1TB).
Netshield is configured to scan the volumes scheduled. During scanning the largest volume the slave node is killed periodically by a split brain. It happens nearly at the same time. And even if I switch over the largest volume (600 GB) to the other node the split brain occurs.
If all resources are held by one node the slave node’s ABEND.LOG tells something like

“Abend information for server xxxxxx Monday, 31 March 2008 2.34.43,100
Server xxxxxx halted Monday, 31 March 2008 2.34.43,100
Abend 1 on P00: Server-5.70.05-0: At least one of the nodes is Alive in the old master's node partition.
This node is NOT in the old master's node partition.
For more information, consult technical information document 10053882 in the knowledgebase on NOVELL: Support.
Registers:
CS = 0060 DS = 007B ES = 007B FS = 007B GS = 007B SS = 0068
EAX = 95016476 EBX = 95EC9520 ECX = FE007CA0 EDX = 95012DE8
ESI = 00000000 EDI = 95EC9520 EBP = 95012DEC ESP = 95012DE0
EIP = 992BF0E3 FLAGS = 00000206
992BF0E3 83C404 ADD ESP, 00000004
EIP in CLSTRLIB.NLM at code start +000060E3h”

If the largest volume is held by the slave node its ABEND.LOG tells me something like

“Abend information for server xxxxxx Wednesday, 23 April 2008 0.05.53,964
Server xxxxxx halted Wednesday, 23 April 2008 0.05.53,964
Abend 1 on P00: Server-5.70.05-0: Ate Poison Pill in SbdWriteNodeTick given by some other node.
Registers:
CS = 0060 DS = 007B ES = 0068 FS = 007B GS = 007B SS = 0068
EAX = 9624C666 EBX = 9624D0E0 ECX = FE007CA0 EDX = 98001EAC
ESI = 00000000 EDI = 9624D0E0 EBP = 98001EB0 ESP = 98001EA4
EIP = 9802E0E3 FLAGS = 00000206
9802E0E3 83C404 ADD ESP, 00000004
EIP in CLSTRLIB.NLM at code start +000060E3h”

Do you have any ideas how to fix this problem?

THX in advance.

Regards

Thomas


p.s. We are running some more clusters that are configured identically. The error doesn’t occur….