We are looking to reduce the impact to our users when we have server
downtime and upgrades. While Windows clients running Novell Client32
should not notice when a replica server is down or rebooted, we find that
there is often something cached on the client machine that tries to
reconnect to a particular server instead of just communicating with any
available replica server.

Our environment:
We have three servers; DS1, DS2, DS3, that provide Master and R/W replicas
for all of our partitions. Currently, the clients are running Windows 2000
with Novell Client 4.9.1.x In the Client settings, the 'Preferred Tree' is
set to TREE.domain.com, which resolves to DS1, DS2, or DS3 in a DNS round
robin. No preferred server is set. SLP is set to Static with the proper
scope and DA's listed.

When a client connects and authenticates to the tree, they get one of the
DS# servers as the Primary server. If that server goes down or is
rebooted, there is a chance any users connected to that server will be
affected in some way. They could lose applications in ZfD, or have
rights or timeout issues because they are trying to talk to eDir, and
instead of getting an available server, they try to attach to the server
that they previously talked to.

If there are any registry settings or client parameters I can check to fix
this great. One solution that was suggested was to use our Cisco CSS load
balancing switch to front the servers. To point the TREE.domain.com to this
CSS would do the same as DNS round robin, except it could determine if the
server is up before directing the client to the server. While that would
be better, I think it would not matter once the connection has been made,
because it seems like the client would go through the CSS to a server, and
that server would be populated as the Primary Server. After that, the
client would not longer look for TREE.domain.com, but would make a request
for the DS server that is the Primary server.

Has anybody tried this scenario? It works great for connectionless
communication like LDAP, but for eDir/NCP, I don't think it would make a
difference. Any other ideas on how we can make it so the users do not
notice any downtime to our replica servers? We use clustering on the file
servers, but we have not implemented it on the replica servers because
eDir should handle it.