Notices


 
 
LinkBack Thread Tools Display Modes
Prev Previous Post   Next Post Next
  #1  
Old 23-Oct-2009, 04:16 PM
Junior Member
 
Join Date: Feb 2008
Posts: 1
babilon 0 reputation points
Default Pool Corrupted after poison pill to nodes

Hi folks,

We're having a two node GroupWise 8 SP1 cluster running on OES2 SP1-SLES10 SP2 with EMC SAN Fibre Channel using EMC PowerPath.

When the resource (called POA_DIRETOR_SERVER) tries load, the other node send a poison pill and reboot the server. I found that all nodes rebooted during the night.

Every three months one of the resources (POOL) is corrupted. The last log:

NSSLOG ==> [Error] zlssMSAP.c[1899]
Oct 21 20:17:43 srv-corp-120 kernel: Oct 21, 2009 7:17:43 pm NSS<ZLSS>-4.11b-xxxx:
Oct 21 20:17:43 srv-corp-120 kernel: MSAP: Pool "POA_DIRETOR" ownership lost, pool may have been corrupted
Oct 21 20:17:43 srv-corp-120 kernel: by being activated from two servers at the same time.
...
Oct 22 09:47:21 srv-corp-120 kernel: err=20801 comnVol.c[894]
Oct 22 09:49:20 srv-corp-120 kernel: err=20801 comnVol.c[894]
Oct 22 09:49:39 srv-corp-120 sshd[16218]: Accepted keyboard-interactive/pam for root from 10.100.207.6 port 59479 ssh2
Oct 22 09:50:00 srv-corp-120 kernel: lsa_vol_statfs: zOpen = 20407
Oct 22 09:50:06 srv-corp-120 kernel: lsa_vol_statfs: zOpen = 20407
Oct 22 09:51:12 srv-corp-120 sshd[22740]: Accepted keyboard-interactive/pam for root from 172.22.0.101 port 1149 ssh2
Oct 22 09:51:15 srv-corp-120 kernel: lsa_vol_statfs: zOpen = 20407
Oct 22 09:51:15 srv-corp-120 kernel: lsa_vol_statfs: zOpen = 20407
Oct 22 09:52:24 srv-corp-120 kernel: err=20801 comnVol.c[894]
Oct 22 09:52:56 srv-corp-120 smdrd[19377]: Received Leave Event for POA_DIRETOR_SERVER
Oct 22 09:52:56 srv-corp-120 smdrd[19377]: Target name POA_DIRETOR_SERVER could not be de-advertised from SLP
Oct 22 09:53:44 srv-corp-120 kernel: CLUSTER-<WARNING>-<6077>: The cluster has lost communication with node [srv-corp-121].
Oct 22 09:53:44 srv-corp-120 kernel: Node [srv-corp-121] may have failed or experiencing other problems.
Oct 22 09:53:44 srv-corp-120 kernel: To ensure cluster stability, this node has sent a poison pill to node [srv-corp-121].
Oct 22 09:53:44 srv-corp-120 kernel: Epoch for this node is higher than for some other node.
Oct 22 09:53:44 srv-corp-120 kernel: Other node is slow to update epoch and bitmask (slow or dead).
Oct 22 09:58:53 srv-corp-120 syslog-ng[13581]: syslog-ng version 1.6.8 starting
Oct 22 09:58:53 srv-corp-120 ifup: lo
Oct 22 09:58:53 srv-corp-120 syslog-ng[13581]: Changing permissions on special file /dev/xconsole
Oct 22 09:58:53 srv-corp-120 syslog-ng[13581]: Changing permissions on special file /dev/tty10
Oct 22 09:58:53 srv-corp-120 dbus-daemon: nds_nss_GetGroupsbyMember: failed to init socket, status = -1
Oct 22 09:58:53 srv-corp-120 dbus-daemon: nds_nss_GetGroupsbyMember: failed to init socket, status

To correct the problem I did a rebuild with ravsui command.

Anybody knows how can I prevent this sort of thing from happening again?

Thanks.
Reply With Quote
 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -6. The time now is 10:54 AM.


© 2007 Novell, Inc. All Rights Reserved.

Search Engine Friendly URLs by vBSEO 3.3.0 RC2