A week and a half ago when the new semester started we started seeing massive slowdowns, but only in the AM and only Mon-Thu. We thought maybe a user back from the summer would turn on a device that would cause a broadcast storm but could find no evidence of that with a protocol analyzer. We did, however find a switch acting up. We replaced that and that appeared to help, but that was a Friday PM -- not a good test. We noticed that when the problem existed, current disk requests were very high. Finally, today (a Monday AM), we put together the fact that only faculty/staff were affected by the problem, not students and, more importantly, this dual effect showed itself on the same machines. It was then that we realized that we had a volume (one accessed only by staff) that was at fault. When this volume is dismounted everything is quick.

Looking at various forum threads, we concluded that we needed to do a pool rebuild with a purge. We started this 10 hours ago and it is approximately 10% complete according to the progress meter, yet the Estimated Time remaining has varied from 15 min to 8 hours. Currently it is at 3 hr 42 min. 5 hr ago it was 1+ hours.

This is a terabyte volume with about 180 GB of files and another 230 GB of purgable files. So far it has processed 6.3 million system objects, performed 1.6 million block reads and 834,000 block writes.

-- Is there any way to predict the actual time it will take?
-- Once it has gone through all 413 GB of files (current and deleted/purgable) does it speed up through the rest or does the defined terabyte get read? Does the Overall Progress % calculate against the total potential volume size or against the actual used space and are purgable files included in the calculation? (Further info about this -- this volume is on a Compellent SAN using thin provisioning. Therefore there is NOT a terabyte of actual space allocated. It grows only as needed, so I can't see how it can even find a real terabyte of space.)
-- I am assuming this process cannot be cancelled, as no cancel option exists, correct?
-- Is it purging as it goes or does the pool rebuild happen first followed by purge?
-- Would there have been a way to do this in a less disruptive way?
-- What could happen to a volume or pool that could exhibit the symptoms we experienced? Could a corrupted user home directory throw it into such disarray and when the user logged out return to normal?
-- What actually can cause this?

This is an NW6 SP5 box.

Thanks for any insight.