On Wed, 23 May 2012 18:06:01 +0000, samthendsgod wrote:

> Greetings!
> We're running edir with a mix of 8.8.1 and 8.8.6 on SLES. No OES. When
> running ndsrepair -C -Ad -A, some servers are reporting obits with flags
> that say
> Flags = 7f9800000000
> or a similar large hex number (as opposed to Flags = 0000, 0002, or
> 0004). On these particular obits, the flag number always starts with 7,
> followed by three digits, and then seven zeros, with last digit being
> either a 0 or 2 (so far as I've seen).

Can't say that I've seen this, but my guess is that the Flags value is
corrupt. What should be 0000 is coming out as 7f9800000000, what should
be 0002 is coming out as 7f9800000002.

Are these obits useful? You could check for the objects the obits are
for. -xk3 and re-backlink may be helpful for clearing out the cruft here.

> Some of them have very odd CTS and MTS timestamps on them as well
> Value CTS : 12-11--51
> or
> Value MTS = 01-06--50

Also possibly just DIB corruption.

I'd question staying on eDir 8.8.1 at this point, that's old. Upgrading
to current versions may or may not help, though, if there's already cruft
in the DIB. I'd definitely plan to get these servers upgraded though.

> Has anyone seen anything like this before? This is an older tree, and
> I'm wondering if these obits were created when time was either not
> synced, or was wonky in some way.

That'd show up with weird object creation timestamps or object modify
timestamps. Are you seeing any of those? If not, I don't think you have
(had) a time problem.

> I also believe there may have been a
> server improperly taken from the tree at some point in the past.

Annoying, but my guess is that this is not related.

> these last two items are purely speculation. Lastly, I believe that the
> transitive vectors (TVs) on the Root partition may be in need of
> attention as well. (At least some of these obits are in the Root replica
> ring.)

I'm not convinced that you have a TV problem. The TV could be a symptom
of the corrupt objects.

> However, some of the obits in the Root replica ring have normal
> flag states (though there are a ton of them that are not processing,

These could be stuck behind the bad ones, so this may also be a symptom.

> hence my suspicion about the TVs). This weekend we're going to disable
> sync and try running dsrepair -ant on the replica ring in an effort to
> repair the TVs. Most-likely after that (because it's my guess that won't
> fix everything, primarily due to these odd CTS amd MTS times), we'll
> probably repair timestamps and declare a new epoch on that partition.

I'll be surprised if -ant does any good here. I'd also probably not
bother with repair timestamps, as that's not going to do anything for the
weird flags values. This is one of the times I'd start with an 'ndsrepair
-R' to see if it can figure out what to do with this DIB. That's kind of
a sledgehammer approach, but from what you're describing, I think you'll
be doing that, and possibly more, anyway.

> iMonitor is showing 626 errors and 631 errors when doing a health check
> on the Root partition on every server except for the server holding the
> Master replica for the ring.

If you have a good Master, you could collapse the ring down by removing
the other replicas, make sure the Master is clean, then recreate the
replicas. It's been a long time since I've had to do that, though.

There's also the "walk the Master around the ring" method of dealing with
stuck obits, which may be effective here. But, again, that's something I
haven't had to do in a very long time.

David Gersic dgersic_@_niu.edu
Knowledge Partner http://forums.novell.com

Please post questions in the forums. No support provided via email.