We have a 2-node cluster whre every 2-3 hours both servers crash within
3-6 seconds of each other. The first server to go reports a generic Page
fault exception in Server.nlm. The second node then reports a cluster
castout, fatal SAN read error, thread owned by sbd.nlm.

the shared storage is connected to a SAN via iSCSI. We have troubleshooted
the SAN itself and are 95% sure there is no direct HW problem there. We
have non-clustered MS and NW servers using the SAN over iSCSI without a

The problem started with nw65sp5 where the servers would just freeze up
and not respond. We then applied sp6 with all the post-sp patches (iscsi,
winsock, tcp, etc) and now we get the abends. An suggestions on where to
troubleshoot? At present we have only one node running in the cluster to
see if the problem isn't with one machine that is then cascading to the
second, but we are running out of ideas.

Does anyone know of a way to really debug the iSCSI connection /
initiator? We've checked the lan traffic on the servers and on the SAN
switch with no evidence of any problem, but I suspect that at some point
the cluster can no longer talk to it sbd partition and then dies. Thaks
for any suggestions.