Notices


 
 
LinkBack Thread Tools Display Modes
Prev Previous Post   Next Post Next
  #1  
Old 26-Oct-2009, 02:20 PM
DE
NNTP User
 
Posts: n/a
Default DNS server issue hangs entire site

There are two linked sites, in separate cities, both run Netware 6.5SP7
on all servers. Each site has a primary file/print server, a Groupwise
server that also is the DNS & DHCP server for the site, and a backup server.

This morning, users could not log on at all at the remote site. The
primary file server at their site seemed hung to me, but I was able to
get into it via RCONSOLE & NRM, and could not spot any reason for the
problem. Checking the error logs, there is only one error shown at all,
and that's in the Health Log:
"DS Thread Usage on server AIRP was in a SUSPECT State"
at 12:57am.

The Groupwise server at the site shows the following message:
"Cannot update NDS against an entry within subnet container:
dhcp_subnet.Portland.Air."
(note there is the period at the end of the error) at 5:40am. However
there was no DHCPSRVR.LOG file by the time I looked at it, at approx.
8:30 am.

There was a backup running at the time of the error, and it got hung
completely, which is not unusual (with Backup Exec for Netware, *sigh*)
if communication with the target server is lost.

The major problem is that users at the site could not log in at all when
they came in at approx. 7:30am. All servers were available via RCON &
NRM, but I couldn't pull up a files list on the AIRP server from the
other site, as I usually can; it just hung the workstation to even try.

==============

I'd like to pinpoint why the problem occurred & how to fix or guard
against it.

There are recent changes to the subject server, although all over a week
ago:

- Memory added to bring it to 8GB approx. a month ago
- A failed RAID drive replaced approx. a month ago
- Norton Antivirus, latest version, made active approx. 8 days ago
- Memory tuning on the server, using Memcalc, approx. 8 days ago

None of the servers have shown any problems in the 8 days prior to this,
although the AIRP server has consistently shown the following error
averaging twice/day, and the extra memory & tuning do not seem to have
helped:
"CPU Utilization-0 on server AIRP was in a SUSPECT state" showing some
value between 94 & 100.

Looking for thoughts on how to diagnose this without much by way of real
error messages, since it does not seem to have been a switch issue as I
originally thought possible (since I see no sign of network dropping on
any of the servers). And looking how to fix it and/or guard against it
happening again.

Restarting the primary file/print server at the site did resolve the
problem, but I really don't want it happening again & I don't know where
to start. My only thought is that the server seems to be getting
overwhelmed & perhaps something critical happened between the backup &
NAV, even though it was fine the week before ... or perhaps something
wrong with the memory tuning (?)

Thanks.


Reply With Quote
 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -6. The time now is 02:13 AM.


© 2007 Novell, Inc. All Rights Reserved.

Search Engine Friendly URLs by vBSEO 3.3.2