I've posted a few snapshot queries on this list, but thought I would post one giant summation of my issues with NSS Data Pool Snapshots.

If you will allow me to vent; I am completely frustrated with NSS Data Pool Snapshots. I've tried them on at least 8 servers and out of the 8 only two run reliably. The two that run reliably are small servers with minimal usage (under 300 GB with no more then 15 users connecting). Snapshots fail with abends in MM.NLM and/or NSS Pool Deactivations. Once this occurs I cannot reactivate a pool, even after a reboot, without deleting all snapshots and starting over. HOW FRUSTRATING!

I have an SR open with Novell on the issue, but so far they seem as stumped as me. What I can't understand is why I have so many issues. I don't' believe I am doing anything exotic.

I mean isn't there anyone else out here using these? How have snapshots worked for you? Why am I the only one with issues?

A little background:

I want to use snapshots to provide quick file restores without going to tape. In my plan I was going to use 28 snapshots per pool (2 per day for 14 days). This proved unstable and Novell thought it might have been the number of snapshots so I reduced this to 1 per day for 14 days. I have followed best practices and have a separate snapshot pool from my data pool. I always delete the oldest snapshot first.

The snapshots have failed on iSCSI to various NetApp filers. The snapshots have failed with Fiber Channel to various NetApp filers. The snapshots have failed with direct attached storage. The snapshots have failed with NetWare 6.5 SP2, 6.5 SP3, 6.5 SP4A, and 6.5 SP4A with SP5 MM.NLM (per Novell's recommendations). I have not yet tested with the pre-SP6 MM.NLM Novell has provided (though I wonder based on this record if MM.NLM has ever worked at ALL!)

The storage pools have varied in size from 5 TB to 550 GB (the largest failing size). Snapshots do appear to work on two servers, one has 589 MB used and 3 users and the other has 300 GB used and 12 users.

The failures:

I have noticed the following issues with snapshots.

1. MM.NLM continually eats more and more ram until the server runs out (1.5 GB out of 3 GB of server ram is the highest I have watched it consume). I have submitted a coredump to Novell for analysis of this behavior.

2. Any snapshot created on a server with a ZENworks Inventory Database (Sybase Adaptive Server/ZENworks Inventory 6.5) corrupts the ZENworks Inventory Database. Fortunately I could use the snapshots to go back to the uncorrupted database and restore. I thought snapshot operations were supposed to be transparent to file access? How can this affect the database?

3. After a period of time (no longer then 14 days worth of snapshots) MM.NLM abends on snapshot creation or deletion. Current Disk Requests raise to 1000+ and server I/O apparently halts.

4. On servers with active users sometimes a datapool will deactivate itself once snapshots are configured. This does not happen without snapshots.

5. If MM.NLM or a pool deactivates the only way to restore access to users is to set NSS /PoolDeact=all then boot with server -na and issue mm snap delete all (wiping out all existing snapshots).

6. Once I have seen a 7 day old snapshot that once activated did not contain the data from 7 days ago. In fact, it appeared to merely contain the data from the present time (despite the snapshot pool indicating 13 GB used).

7. MM SNAP RENAME does not appear to function correctly. If I use MM SNAP RENAME on a snapshot then try to delete the renamed snapshot the space on the SNAPSHOT pool is never restored.

8. Sometimes MM SNAP DELETE registers that it has deleted a snapshot but the snapshot store pool does not reduce in size. In this case I usually delete the pool and recreate it (as it is just snapshots).

Thanks for letting me rant.