I have had a strange issue the last few weeks that I am hoping someone can
shed some light on.

I have 1 nw6sp5 server that has a r/w of every replica in our tree on it (
about 95 of them). It is running IDM 2 and file system factory.

This server will sometimes restart for no apparent reason, but that is not
what has me stumped. Within 2 to 3 minutes of losing that server I will lose
about 40 more servers in our WAN, which is about 1/3 of the servers I have.
These servers are all in the same tree and each holds a r/w of the partition
they are in.

Nothing is written to the abend log for the first server and the Compaq log
says this "ASR Lockup Detected: Code executing at NetWare OS address 219A9Ah
when ASR NMI occured." The other servers report similar vague errors, such
as a page fault or an abend in various NLMs. I can find no consistency in
the errors.

Can these server abends be related to the first server going down, or is
that just crazy? This is the 4th time this has happened, but not always the
same servers, but the one that goes down first is the same.

The only updates we have scheduled to all servers is CA virus files, and
that is done at 5 AM. The restarts are not near that time at all. This
happens about every 2 weeks, but not at the same time.

I cannot identify any other thing these servers have in common. Some are
6sp5, some 6.5sp4a, and some 6.5sp5. They are all Compaq proliant ML370, but
not the same generation.

Any and all ideas are appreciated.