I first posted this thread in the OES/NCS sub-forum but I was told there
that the problem is likely caused by NDS daemon. So I re-post it here.

System and versions:
Novell Cluster with 9 nodes and 44 cluster volumes under
SLES11SP1/OES11 with the last patches applied
Hardware: HP Blades and HP Storage (EVA7000)

For some weeks we have a problem with cluster volumes and nds daemons.
Almost daily the cluster volumes of one cluster node become unavailable
for the clients. The command

# rcndsd status returns "Unable to get server status".

In this case the server in question has to be rebooted. The cluster
volumes (because of rebooting) migrate to other cluster nodes and become
functionable again.
After some hours another server shows the same symptoms. It seems to be
that there are one or two "favorite" cluster volumes that are always

Additional information:
I applied TD 7012793 to one cluster node. The only change: When the
cluster volumes become unavailable to clients, the command
# rcndsd status returns no error in this case. But when the cluster
volume is migrated (by iManager) the ndsd of the server from wich is
migrated crashes with "dead" as the return value of the rcndsd status

A piece of /var/log/messages
I migrated by iManager the cluster volumen C3-NL3K12P-SERVER that became
unavailable for clients from the server nc308

Sep 25 06:13:01 nc308 /usr/sbin/cron[22602]: (root) CMD
Sep 25 06:14:48 nc308 [XTCOM]: pam_sm_authenticate in pam_ncl.c
(novell-client's pam)is called
Sep 25 06:15:01 nc308 /usr/sbin/cron[22639]: (root) CMD (
Sep 25 06:16:15 nc308 sshd[22665]: Accepted keyboard-interactive/pam for
root from port 58548 ssh2
Sep 25 06:19:28 nc308 smdrd[16219]: Received Leave Event for
Sep 25 06:19:28 nc308 smdrd[16219]: Target name C3-NL3K12P-SERVER
successfully de-advertised from SLP
Sep 25 06:19:28 nc308 kernel: [54445.897985] ndsd[22110]: segfault at 58
ip 00007fb6b44962b9 sp 00007fb69cec1be0 error 4 in
Sep 25 06:19:29 nc308 smdrd[16219]: Could not start TCP listener on
Sep 25 06:19:32 nc308 adminus daemon: umounting volume NL3K12S lazy=1
Sep 25 06:19:34 nc308 kernel: [54451.742301] NSSLOG ==> [MSAP]
Sep 25 06:19:34 nc308 kernel: [54451.742303] Pool "NL3K12P" - MSAP
Sep 25 06:20:01 nc308 /usr/sbin/cron[22848]: (root) CMD (
Sep 25 06:21:50 nc308 shutdown[22906]: shutting down for system reboot
Sep 25 06:21:51 nc308 init: Switching to runlevel: 6
Sep 25 06:21:53 nc308 kernel: [54591.102010] bootsplash: status on
console 0 changed to on
Sep 25 06:21:57 nc308 multipathd: 36001438012599fc20000400000c40000:
stop event checker thread (140680465872640)

mten's Profile: https://forums.netiq.com/member.php?userid=717
View this thread: https://forums.netiq.com/showthread.php?t=48785