Hi,

I am trying to do a test-restore of our HOME-Volume.... it fails and leaves
the server in a weird state...

It's OES 11 SP 1, Tivoli Client 6.2.5, Tivoli Server 5.5; the Volume has
about 1.1 TB, about 6 million objects; the target volume for the restore is
a newly created NSS pool and volume on a SAN device. The server is part of a
2 server cluster - while investigating this, all productive cluster
resources were on the other node.

The restore starts, runs several hours, then Tivoli Client simply stops to
do anything, the Tivoli Server does not have a session to this client any
more.

There are symptoms that make me think it's an OES issue not a Tivoli issue:

top shows a load of permanently between 4 and 6, which is not the case in
normal operation:


top - 08:22:52 up 15:34, 3 users, load average: 4.20, 4.22, 4.19
Tasks: 328 total, 1 running, 327 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.2%us, 12.5%sy, 0.0%ni, 68.3%id, 0.8%wa, 0.0%hi, 0.2%si,
0.0%st
Mem: 16079M total, 9131M used, 6948M free, 288M buffers
Swap: 994M total, 0M used, 994M free, 3238M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11319 root 20 0 1176m 728m 20m S 8 4.5 71:58.07 ndsd
14391 root 20 0 141m 9.8m 3068 S 5 0.1 0:00.48 sfcbd
9349 root 20 0 222m 10m 4176 S 2 0.1 5:37.37 httpstkd


If ndsd is stopped (killed), and also cifs, afp, httpstkd, sfcbd - load
stays the same, even if no process seems to consume CPU.


ndsd does not properly shut down, I think because of a problem with NCP
server, it finally gets killed:

May 09 09:03:04 About to stop Novell eDirectory server on host
...
May 09 09:03:16 Shutdown NCPServer
WARNING: ndsd process is still running. Killing ndsd.
May 09 09:03:04 Stopped Novell eDirectory server on host


Restarting ndsd fails:

May 09 09:12:04 Unable to bind to address xxx.xxx.xxx.xxx.524. Address
already in use
May 09 09:12:04 Could not bind to xxx.xxx.xxx.xxx:524 Address already in
use
May 09 09:12:05 Unable to bind to address xxx.xxx.xxx.xxx.524. Address
already in use
May 09 09:12:05 Could not bind to xxx.xxx.xxx.xxx:524 Address already in
use
May 09 09:12:06 MASV closed
May 09 09:12:06 SPM DClient closed
[ -- DHost Logging STOPPED Fri May 9 09:12:06 2014 -- ]
May 09 09:12:06 Shutdown NCPServer
May 09 09:12:06 Shutdown NCPServer ... beginning check for packets in
process
May 09 09:12:06 DSDeregisterSignalHandler succeeded for signal 63
May 09 09:12:06 ... NCPServer halted


Mostly (but not always) NSS commands are blocked, for example nss /pools
just hangs, shows nothing.

Mostly (but not always) the system fails to shutdown, the command is just
ignored, nothing happens, hardware reset is required.


Any ideas? (Probably I'll open a SR, but usually the forums are quicker...
:-) )

Thanks,
Mirko