Problem Description:
School network of @ 150 workstations. 2-3 times per day, anyone who
attempts to login will suddenly be unable to connect to a specific one of
the two servers that they map drives to.
This particular server has all their profile information and home
directories on it. As soon as it happens to one person anywhere in the
building, anyone else that attempts to login
will exeperience the same issue (Workstation only logins are unaffected).
If I run DSREPAIR -U on the problem server, the issue goes away and
everything is fine again. Last school year
this would happen maybe once a week, this year it is happening several
times a day. The only real change is that I updated the UNaffected server
from NW6 to NW6.5 SP6 and I upgraded
the problem server from NW6.5SP3 to SP6.

I am looking for a solution to this problem or troubleshooting steps that
I should be taking. Below I have given a description of our environment
and several other pieces of information
that I hope will be helpful to troubleshooting this problem. Thanks in
advance for your help.

Joshua Schoeneck
joshua@[NOSPAM]kmlhs.org

Environment Description:
3 Netware 6.5 servers - MAIN (SP6), JOSIAH (SP6), LUKE (SP3)
1 Windows 2003 server with edir installed GWISE7

100/1000 switch LAN

Clients are WinXP SP2 with up to date patches and NWClient 4.91 SP3
Clients map drives and connect only to MAIN and JOSIAH

All users are set up with DLU through ZenWorks 6.5
Student users are assigned to local USERS group in Windows



Here's what I see on the workstation when the login script runs while the
problem is occuring:

Your current context is stu.KML
User: S2011004 Context: stu.KML
Your current tree is: KML
You are attached to server MAIN.
Drive H: = MAIN_SYS.:
JOSIAH
LOGIN-LGNWNT32.DLL-890: The specified server is unknown.
LOGIN-LGNWNT32.DLL-430: The following drive mapping operation could not be
completed.
[ROOT I:=\JOSIAHVOL1SUSERSS2011004]
The error code was 8884.
JOSIAH
LOGIN-LGNWNT32.DLL-890: The specified server is unknown.
LOGIN-LGNWNT32.DLL-430: The following drive mapping operation could not be
completed.
[ROOT O:=\JOSIAHVOL1SHARE]
The error code was 8884.
Drive P: = MAIN_SYS.:APPS
S1: = Z:. [MAIN_SYS.:PUBLIC ]
S2: = Y:. [MAIN_SYS.:PUBLICNLS ]
Drives C,D,E map to a local disk.
Drive H: = MAIN_SYS.:
Drive P: = MAIN_SYS.:APPS
----- Search Drives -----
S1: = C:WINDOWSsystem32
S2: = C:WINDOWS
S3: = C:WINDOWSSystem32Wbem
S4: = C:Program FilesLightspeed SystemsSecurityAgent
S5: = C:Program FilesCommon FilesGTK2.0bin
S6: = C:Program FilesQuickTimeQTSystem
S7: = C:WINDOWSsystem32nls
S8: = C:WINDOWSsystem32nlsENGLISH
S9: = C:Program FilesNovellZENworks
S10: = X:. [MAIN_SYS.:PUBLIC ]
S11: = Z:. [MAIN_SYS.:PUBLIC ]
S12: = Y:. [MAIN_SYS.:PUBLICNLS ]
LOGIN-LGNWNT32.DLL-310: The workstation date and time could not be set.




If I the run DSREPAIR -U on the server JOSIAH, I get this in the
DSREPAIR.LOG:
(Note: running DSREPAIR -U on any of the other servers does not appear to
correct the problem.)

/************************************************** **************************/
NetWare 1602.00 Directory Services Repair 10551.26, DS 10553.73
Log file for server ".JOSIAH.KML" in tree "KML"

** Automated Repair Mode **
Repairing Local Database
Start: Friday, September 7, 2007 9:10:51 am Local Time

** All disk amounts are approximations **
->Current available disk space: 68853 MB
->DSRepair may need to use: 48 MB
->Disk space remaining after operation: 68828 MB

Physical Check
Creating Temporary Files
Repair Trees - Scan Values
Repair Trees - Sorting Values
Repair Trees - Scan Entries
Repair Trees - Sorting Entries
Repair Trees - Check Values
Repair Trees - Check Entries
Total Objects in Database: 4981
Total Objects in Schema : 2605
Total External References: 1
Total Objects in Replicas: 2372
Schema Check

Repairing objects in a replica
Start: Friday, September 7, 2007 9:11:26 am Local Time
[1 of 1] Read Write : T=KML
Total objects in partition - T=KML : 2372
Repairing objects - done(1000)
Repairing objects - done(2000)
Repairing objects - done(2372)

Total Objects = 2372, UNKNOWN class objects = 0, Total Values = 87117

Checking local references
Start: Friday, September 7, 2007 9:11:32 am Local Time

ERROR: Invalid reference to object, ID: 0000922F, value is purged
Attribute 46E15ACE, Reference
Object ID: 0000855E, DN: CN=10_1_103_13.OU=equipment.O=KML.T=KML


ERROR: Adding reference to object ID: 000087C3, DN:
CN=10_1_151_53.OU=equipment.O=KML.T=KML
Referenced by attribute: 00000516, zenwmLoggedInWorkstation
of object ID: 0000824E, DN: CN=dwilcox.OU=fac.O=KML.T=KML


ERROR: Invalid reference to object, ID: 0000923C, value is purged
Attribute 46E15944, Reference
Object ID: 000085F2, DN: CN=10_1_103_3.OU=equipment.O=KML.T=KML


ERROR: Invalid reference to object, ID: 0000905D, value is purged
Attribute 46E1556F, Reference
Object ID: 00008CBC, DN: CN=10_1_108_1.OU=equipment.O=KML.T=KML


ERROR: Adding reference to object ID: 00008814, DN:
CN=10_1_151_52.OU=equipment.O=KML.T=KML
Referenced by attribute: 00000516, zenwmLoggedInWorkstation
of object ID: 00008273, DN: CN=wbreimon.OU=fac.O=KML.T=KML


ERROR: Adding reference to object ID: 00008A8E, DN:
CN=10_1_151_58.OU=equipment.O=KML.T=KML
Referenced by attribute: 00000516, zenwmLoggedInWorkstation
of object ID: 00008277, DN: CN=wzimmerm.OU=fac.O=KML.T=KML


ERROR: Invalid reference to object, ID: 00009186, value is purged
Attribute 46E15723, Reference
Object ID: 000084AE, DN: CN=10_1_103_19.OU=equipment.O=KML.T=KML


ERROR: Invalid reference to object, ID: 00009169, value is purged
Attribute 46E15621, Reference
Object ID: 00008537, DN: CN=10_1_103_20.OU=equipment.O=KML.T=KML

NOTICE: CN=Internet Users.OU=stu.O=KML.T=KML, has: 1555 value references
Repairing single object:

Object ID: 00008032, [Pseudo Server]

Total Objects = 1, UNKNOWN class objects = 0, Total Values = 109
Creating Old Files
Temporary DIB set replacing NDS working DIB set.
Checking mail directories
Checking stream syntax files
Repair process completed, total errors found = 8

** Automated Repair Mode **
Repairing Server Network Addresses
Start: Friday, September 7, 2007 9:11:50 am Local Time
************************************************** **************
This operation will search IPX, SLP, and DNS tables, if
available, in order to validate network address attributes of
NCP_SERVERS, as well as the referrals on replica objects.
If the information on DNS/DHCP tables or the /etc/hosts file is
incorrect, then we cannot guarantee proper validation of network
addresses, nor replica referral updates. This is particularly true
if the system uses unregistered DHCP addresses or the information
in the HOSTS file is incorrect or outdated
************************************************** **************

Checking server: .GWISE7.KML
Found a Network Address Property on the server object and through SLP:
Address Type = TCP, data[6] = 10.1.0.15:524
Found a Network Address Property on the server object and through SLP:
Address Type = UDP, data[6] = 10.1.0.15:524

Checking server: .LUKE.KML
Found a Network Address Property on the server object and through SLP:
Address Type = TCP, data[6] = 10.1.0.13:524
Found a Network Address Property on the server object and through SLP:
Address Type = UDP, data[6] = 10.1.0.13:524
Checking server address in Replica ID: 3, .[Root].

Checking server: .MAIN.KML
Found a Network Address Property on the server object and through SLP:
Address Type = IPX, data[12] = 0000000A0000000000010451
Found a Network Address Property on the server object and through SLP:
Address Type = TCP, data[6] = 10.1.0.1:524
Found a Network Address Property on the server object and through SLP:
Address Type = UDP, data[6] = 10.1.0.1:524
Checking server address in Replica ID: 1, .[Root].

Checking server: .JOSIAH.KML
Found a Network Address Property on the server object and through DNS:
Address Type = TCP, data[6] = 10.1.0.17:524
Checking server address in Replica ID: 4, .[Root].

** Automated Repair Mode **
Repairing replica ring
Start: Friday, September 7, 2007 9:11:50 am Local Time

Replica Ring for replica: .[Root].
Remote server's local ID: 0000833B
Remote server's replica root ID: 00008016
Remote server name is: .MAIN.KML
OK - Authenticated to server
Remote server's local ID: 00008334
Remote server's replica root ID: 00008043
Remote server name is: .LUKE.KML
OK - Authenticated to server
Remote server's local ID: 0000803A
Remote server's replica root ID: 00008038
Remote server name is: .JOSIAH.KML
OK - Authenticated to server

** Automated Repair Mode **
Volume Object and Trustee Check
Start: Friday, September 7, 2007 9:11:50 am Local Time

Volume: SYS, object ID: 0000805A, CN=JOSIAH_SYS.O=KML.T=KML
Checking trustees on volume: SYS

Volume: VOL1, object ID: 0000805C, CN=JOSIAH_VOL1.O=KML.T=KML
Checking trustees on volume: VOL1

Volumes checked: 2

** Automated Repair Mode **
Finish: Friday, September 7, 2007 9:11:50 am Local Time
Total repair time: 0:01:00



The only errors that I ever see in the DSREPAIR.LOG are the INVALID
REFERENCE errors about the workstations, sometimes they are workstations
that are experiencing the problems and sometimes they aren't.
Using DSBROWSE to look up the reference numbers, I can see that the
invalid references are to user objects. Sometimes these are affected
users and sometimes they are not.


On all servers, a DSREPAIR syncronization check always comes up clean:

/************************************************** **************************/
NetWare 1602.00 Directory Services Repair 10551.26, DS 10553.73
Log file for server ".JOSIAH.KML" in tree "KML"
Time synchronization and server status information
Start: Friday, September 7, 2007 9:56:40 am Local Time

---------------------------+---------+---------+-----------+--------+-------
DS.NLM Replica Time Time is
Time
Server name Version Depth Source in sync
+/-
---------------------------+---------+---------+-----------+--------+-------
.GWISE7.KML 20112.91 -1 Non-NetWare Yes
0
.LUKE.KML 10551.78 0 Secondary Yes
0
.MAIN.KML 10553.73 0 Reference Yes
0
.JOSIAH.KML 10553.73 0 Secondary Yes
0
---------------------------+---------+---------+-----------+--------+-------

*** END ***


Here's an SLPINFO /ALL from one of the affected PC's while the problem is
occurring:
(It doesn't look any different when the problem is not occurring.)

************************************************** ***
*** Novell Client for Windows NT ***
*** Service Location Diagnostics ***
************************************************** ***

SLP Version: 4.91.3.0
SLP Start Time: 8:15:28am 9/7/2007
Last I/O: 8:15:39am 9/7/2007
Total Packets: Out: 3 In: 2
Total Bytes: Out: 132 In: 137


SLP Operational Parameters Values
------------------------------- ------------
Static Scopes YES
Static Directory Agents YES
Active Discovery YES
Use Broadcast for SLP Multicast NO
Use DHCP for SLP YES
SLP Maximum Transmission Unit 1400 bytes
SLP Multicast Radius 32 hops


SLP Timers Values
------------------------------------- ------------
Give Up on Requests to SAs 15 seconds
Close Idle TCP Connections 5 minutes
Cache SLP Replies 1 minutes
SLP Default Registration Lifetime 10800 seconds
Wait Before Giving Up on DA 5 seconds
Wait Before Registering on Passive DA 1-2 seconds


Scope List Source(s)
---------------------------------------- ------------
MAIN_SCOPE CNFG


DA IP Address Source(s) State Version Local Interface Scope(s)
--------------- --------- ----- ------- ---------------
---------------
10.1.0.1 CNFG UP SLPV2 10.1.103.18 MAIN_SCOPE


Local Interface 10.1.103.18
---------------------------------
Operational State: UP
Operating Mode(s): MCAST,STATIC-DA
SA/UA Scopes: MAIN_SCOPE
Last I/O: 8:15:39am 9/7/2007
Total Packets: Out: 3 In: 2
Total Bytes: Out: 132 In: 137
Last Addr Out: 10.1.0.1
Last Addr In: 10.1.0.1




If you have a better suggestion for what forum to post this in, please let
me know. Thanks.