I've a SLES 11 SP4, OES 2015.1 system that has been giving me trouble for a while. eDirectory will stop responding at random times, I can stop the service via command (rcndsd stop). This process can take up to ten minutes or so to complete, the log file says this "WARNING: ndsd process is still running. Killing ndsd." I can start it back up and everything is good. It can run for a couple of months or the next day it will stop again. Over time it has become more frequent (every 4 to 6 months to every week or two) so I put in a cron job to restart eDirectory every weekend and this has prevented most problems however it can restart one weekend and the following Monday or Tuesday it might stop responding again.
I also have noticed over time that when eDirectory is starting to misbehave, login times start increasing for some people, others it seems to work ok and a lot people can not login at all. I could always tell in the past when looking at the server processes (top command) ndsd would eat alot of CPU. I've have noticed recently though when this problem occurs the CPU usage by ndsd is really not that bad.


We only have about 1200 users and 700 workstations. The system is the master replica, a VM with 4 threads and 11GB ram. This was an issue even before it was converted from a bare metal system to a VM a few years ago. This server is in a replica read write ring with 3 other servers and only has DS, LDAP, Storage Manager and iManager running on it. The other servers have never had an issue, they also have printers (iPrint), Storage manager, and NSS volumes for our users. Ndsd seems to be pretty quiet most of the time and will have relatively high CPU usage every once and a while for a short period so I don't think its a load issue.
We have a Zenworks 2018 system, an iBoss web filter, a Groupwise system and a few other systems that all hit this server. The "other systems" really don't have very many transactions. By far the most frequent hits are from Zenworks and our iBoss web filter. I used an LDAP trace from this document 7007106 to get some info.

I may have went overboard with the description but I'm at loss as to what the problem is and where to look. I'm also not very good a troubleshooting eDirectory or LDAP issues, it tends to go over my head for the most part. It also makes it very hard to narrow down what or where the problem is because of the seemingly random nature of the issue.

Thanks for any help or direction you can give,
Michael