Appreciate the info, Sam. Thanks.


On 6/5/2012 5:06 PM, samthendsgod wrote:
> Hi Everyone,
> I wanted to share information about a problem we ran into last night,
> and how we fixed it (with the help of opening an SR with Novell). We
> have a mixed environment of 881 and 886 servers. (We are migrating from
> the 881 servers and should be done in a month.) Last night we were
> adding replicas to one of the new 886 servers. Everything went as
> expected. After the last replica turned On, I started going to each of
> the 20 servers in the tree to check sync status, and just making sure
> there were no problems (not only because that's a good idea, but 881 is,
> well, 881.)
> All of a sudden on one server, while running ndsrepair -E I started
> seeing a -699 Remote error. Having never seen this before, I went to
> for information on what it was, if it was cosmetic or
> a problem, and what to do about it. I looked around for maybe three
> minutes, then went back to the server and ran ndsrepair -E again. Now
> there were about 40 reports of -699 Remote. To make a long story short,
> ultimately I ended up with a few hundred 699 Remote errors in report
> sync status. THis was happening on every server in the tree.
> Additionally, on the remote server that initially showed up with the 699
> error, I was seeing -755 errors in ndstrace ("Verification Failed").
> This was another new one on me. Lastly, other servers report -625 errors
> in communication between the affected servers.
> Keeping the thought of making a long story short, I called Novell for
> help. What we found was that one of the servers (the one that initially
> was the remote server that was showing the -699 error) had for whatever
> reason, assigned its own ip address for tcp and udp as values in the
> Network Address attribute of all the rest of the servers in the tree (in
> its own dib), as being reported in iMonitor. The rest of the servers in
> the tree were fine (with regards to how they were seeing the Network
> Address attribute values for the other servers in the tree), although
> they were all getting more and more -699 Remote errors in report sync
> status. And of course, nothing would sync due to this bad value in the
> Network Address attribute on the one server.
> This tree was set up without SLP. The tree is used only for
> authorization via LDAP queries. Every server holds a copy of any
> partition that users or consumers would query against, so while I
> personally would like to see SLP running, it's worked fine this way for
> six or seven years.
> To fix this, what we ended up doing was to put a hosts.nds file in the
> same directory as nds.conf. The file looked very similar to a
> resolv.conf file. For instance:
> FS1
> FS2
> FS3
> where the first column is the names of the servers; the second column
> is the ip address of the server and the port where ndsd is running.
> After populating this file, we ran ndsrepair -N, and repaired all
> network addresses. We only had to do this on the one server that
> initially was being reported with the -699 Remote (the server that had
> the extra value on the Network Address attribute for the rest of the
> servers in its own dib), and the problem started clearing itself up.
> However we went ahead and performed the same steps in the rest of the
> servers in the tree.
> I wanted to share this because I didn't find anything that was very
> helpful in searching or Google. There were some TIDs and
> suggestions, however they were rather complicated, and most carried
> warnings that performing the steps could actually do more harm and
> potentially make things worse. The steps I've outlined above though are
> much more safe.
> I hope this may help someone else out in the future.
> Take care,
> Sam