There are two linked sites, in separate cities, both run Netware 6.5SP7
on all servers. Each site has a primary file/print server, a Groupwise
server that also is the DNS & DHCP server for the site, and a backup server.

This morning, users could not log on at all at the remote site. The
primary file server at their site seemed hung to me, but I was able to
get into it via RCONSOLE & NRM, and could not spot any reason for the
problem. Checking the error logs, there is only one error shown at all,
and that's in the Health Log:
"DS Thread Usage on server AIRP was in a SUSPECT State"
at 12:57am.

The Groupwise server at the site shows the following message:
"Cannot update NDS against an entry within subnet container:
(note there is the period at the end of the error) at 5:40am. However
there was no DHCPSRVR.LOG file by the time I looked at it, at approx.
8:30 am.

There was a backup running at the time of the error, and it got hung
completely, which is not unusual (with Backup Exec for Netware, *sigh*)
if communication with the target server is lost.

The major problem is that users at the site could not log in at all when
they came in at approx. 7:30am. All servers were available via RCON &
NRM, but I couldn't pull up a files list on the AIRP server from the
other site, as I usually can; it just hung the workstation to even try.


I'd like to pinpoint why the problem occurred & how to fix or guard
against it.

There are recent changes to the subject server, although all over a week

- Memory added to bring it to 8GB approx. a month ago
- A failed RAID drive replaced approx. a month ago
- Norton Antivirus, latest version, made active approx. 8 days ago
- Memory tuning on the server, using Memcalc, approx. 8 days ago

None of the servers have shown any problems in the 8 days prior to this,
although the AIRP server has consistently shown the following error
averaging twice/day, and the extra memory & tuning do not seem to have
"CPU Utilization-0 on server AIRP was in a SUSPECT state" showing some
value between 94 & 100.

Looking for thoughts on how to diagnose this without much by way of real
error messages, since it does not seem to have been a switch issue as I
originally thought possible (since I see no sign of network dropping on
any of the servers). And looking how to fix it and/or guard against it
happening again.

Restarting the primary file/print server at the site did resolve the
problem, but I really don't want it happening again & I don't know where
to start. My only thought is that the server seems to be getting
overwhelmed & perhaps something critical happened between the backup &
NAV, even though it was fine the week before ... or perhaps something
wrong with the memory tuning (?)