A user created a group with 755 owners and 16 members yesterday (why, I
have no idea; we're trying to find out). That seems to have put
replication for the partition which holds group objects into a tailspin
though (max ring delta is approaching 19 hours at this point; waiting
overnight for it to fix itself has not worked). We have an SR open, but
I was wondering if anyone has seen similar behavior or has any
suggestions about how to fix it.

We're seeing 625 and 626 errors between servers randomly appear even
though there is no network issue (I can make manual connections on
TCP/524 between any 2 servers without issue). Using ndsrepair -N seems
to alleviate a particular problem for a short time but is not permanent.
ndstrace shows the servers busy working on the group in question for
the most part. Replication of other partitions appears to be unaffected
(a bit slow due to overall load but not abnormal).

My guess was something due to ACLs for all those owners might be at
fault, although why I'm not quite sure. Mainly just asking if anyone
else has seen behavior like this. Thanks.


--
brucetimberlake
------------------------------------------------------------------------
brucetimberlake's Profile: https://forums.netiq.com/member.php?userid=1036
View this thread: https://forums.netiq.com/showthread.php?t=56709