Hello,

This is the second time in a week our SLES 10 OES2 GroupWise server has dropped the iSCSI connection to our IBM DS300 SAN.

The SAN has three servers with volumes mounted on it, and only this server is dropping it's connection. All the volumes are NSS volumes. The two other servers are a Netware box and another SLES10 OES2 server. They are keeping their connections.

At 21:00 hours I start a cron job to backup groupwise. Here is a snip from the messages log.

Oct 14 21:30:01 fvlgw /usr/sbin/cron[12119]: (root) CMD (/opt/novell/groupwise/agents/bin/gwbackup)
Oct 14 21:30:25 fvlgw kernel: connection0:0: iscsi: detected conn error (1011)
Oct 14 21:30:26 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:30:26 fvlgw iscsid: detected iSCSI connection 0:0 error (1011) state (3)
Oct 14 21:30:27 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:32:18 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:36:28 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:39:11 fvlgw kernel: connection0:0: iscsi: detected conn error (1011)
Oct 14 21:39:12 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:39:12 fvlgw iscsid: detected iSCSI connection 0:0 error (1011) state (3)
Oct 14 21:39:13 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:41:25 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:41:25 fvlgw kernel: connection0:0: iscsi: detected conn error (1011)
Oct 14 21:41:26 fvlgw iscsid: detected iSCSI connection 0:0 error (1011) state (3)
Oct 14 21:41:27 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:42:35 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)
Oct 14 21:43:12 fvlgw iscsid: connection0:0 is operational after recovery (1 attempts)

and finally, a little after midnight, I think here is where it drops the volumes:
Oct 15 00:22:29 fvlgw kernel: NSSLOG ==> [Error] comnPool.c[2520]
Oct 15 00:22:29 fvlgw kernel: Oct 15, 2008 12:22:29 am NSS<COMN>-4.10a-xxxx:
Oct 15 00:22:29 fvlgw kernel: Pool BACKUPPOOL: System data error 20012(beastTree.c[514]). Block 672854(file block -672854)(ZID 1)
Oct 15 00:22:29 fvlgw kernel: NSSLOG ==> [Error] comnVol.c[9203]
Oct 15 00:22:29 fvlgw kernel: Oct 15, 2008 12:22:29 am NSS<COMN>-4.10a-xxxx:
Oct 15 00:22:29 fvlgw kernel: Volume BACKUPVOL: System data error 20012(beastTree.c[514]). Block 672854(file block -672854)(ZID 1)

This SAN/Server combination has been running with no connection issues for about 6 months, and suddenly two drops in a week. I need to go into YaST, iSCSI initiator to reconnect. It appears to reconnect with no problems.

Anyone have any ideas where I can look for more error logs or what the problem is?

Also, RUG has the server patched to the latest patches.

Thanks!

Matt