Hi

We have a strange one here. Box is an IBM 345 that is one node of a two
node cluster. Originally build using NW6.5 SP2 overlay CDs we had the same
nightmares as everyone else for nearly a year until the release of SP3a
whereby the product became fit for purpose, staying up for months at a
time with no problems.

As SP4a had been out a while I decided to patch this two node cluster as a
test. One node installed ok but the were problems with this one.

In common with alot of people I had the installation stop towards the end
as it couldn't unload java. Server would not restart without abending due
to java. I booted using F8 to the point before any java loaded so I could
re-run the service pack again. This then went in without any reported
errors.

NB On the first pass I backed up the files to be changed and elected not
to update drivers. On the second pass I didn't select the backup and
elected again not to update the drivers.

Server now appears ok with the exception of a curious problem whereby CPU0
will max out at 100% for no reason I have been able to establish.

The profile/debug below shows that it is SERVER.NLM that is the culprit

Execution Profile Data by NLM
NLM Name NLM Description Execution Time Processor 0 Processor 1
SERVER.NLM NetWare Server Operating System 66.8 % 59.1 % 40.9 %
LSL.NLM Novell NetWare Link Support Layer 17.4 % 0.0 % 100.0 %
LOADER.NLM NetWare OS Loader 1.8 % 88.9 % 11.1 %
DS.NLM Novell eDirectory Version 8.7.3.7 SMP 1.2 % 100.0 % 0.0 %
MSM.NLM Novell Multi-Processor Media Support Module 0.6 % 0.0 % 100.0 %
PORTAL.NLM CPR - Novell Remote Manager NLM 0.4 % 50.0 % 50.0 %
DSLOADER.NLM Novell eDirectory Version 8.7.3 Loader SMP 0.2 % 100.0 % 0.0
%

One area of interest was from View Statistics for the Kernel as shown below
Note that the Normal WorkToDo's is very high at 10,000,000 to 12,000,000

Event Counters By CPU
Kernel Event Name CPU 0 CPU 1 TOTAL
CPU Utilization 100 4
Context Switches 2,347 108 2,455
Interrupts 135 176 311
Fast WorkToDo's 14 39 53
Normal WorkToDo's 11,865,961 35 11,865,996
Spin LOCKS Asserted 4,008 2,739 6,747
Spin LOCKS Busy 4 4 8
Spin LOCKS Waiting loops 10 12 22
Thread Context Spin Waits 0 0 0
Enter Netware (CPU 0) Binding 2 45 47
Media Manager IOs Completed 0 6 6
AES Events - No Sleeping 0 0 0
AES Events - Sleeping Allowed 0 0 0
Mutexes Acquired 529 2,272 2,801
Mutexes Blocking 2 0 2
Mutexes Allocated 0 0 0
Mutexes Freed 0 0 0
Semaphores Allocated 0 0 0
Semaphores Freed 0 0 0
Semaphore Wait Calls 159 22 181
RW LOCK Writer Blocks 0 0 0
RW LOCK Reader Blocks 0 0 0
Condition Variable Blocks 0 0 0
Old Semaphores Allocated 0 0 0
Old Semaphore Freed 0 0 0
CPSemaphore Calls 0 0 0
CSleepUntilInterrupt Calls 22 0 22
Yield Calls That Yielded 78 0 78
Unneeded Yields 0 0 0
Threads Preempted 0 0 0
Threads Moved to other CPU 0 3 3
CALL OUTs 0 390 390
IO Adapter Polling Calls 0 8,695,256 8,695,256
New Threads Created 0 0 0
Page Faults 0 0 0
CPU TLB Flushes 0 0 0
CPU TLB Shootdowns 0 0 0
Worker Thread Handovers 0 0 0
WTD No Memory Available 0 0 0
WTD Thread Create Failed 0 0 0
WTD Max Threads In Use 0 0 0
WTD Time NOT Expired 0 0 0
MP WTD Max Threads In Use 0 0 0
MP WTD Time NOT Expired 0 0 0

Note that at the moment the server is in the cluster but is not running
any services. It has some partition replicas so is doing some skulking and
authentication but apart from that its not up to much.

I'm not sure where to look next in order to locate the problem so any
advise would be helpful.

Rgds